推荐系统netflix获奖算法

发布时间:2021-06-07

赢得netflix推荐系统大奖的算法

1

TheBellKorSolutiontotheNet ixGrandPrize

YehudaKorenAugust2009

I.INTRODUCTION

Thisarticledescribespartofourcontributiontothe“Bell-Kor’sPragmaticChaos” nalsolution,whichwontheNet ixGrandPrize.TheotherportionofthecontributionwascreatedwhileworkingatAT&TwithRobertBellandChrisVolinsky,asreportedinour2008ProgressPrizereport[3].The nalsolutionincludesallthepredictorsdescribedthere.Inthisarticlewedescribeonlythenewerpredictors.

Sowhatisnewoverlastyear’ssolution?Firstwefurtherim-provedthebaselinepredictors(Sec.III).Thisinturnimprovesourothermodels,whichincorporatethosepredictors,likethematrixfactorizationmodel(Sec.IV).Inaddition,anextensionoftheneighborhoodmodelthataddressestemporaldynamicswasintroduced(Sec.V).OntheRestrictedBoltzmannMa-chines(RBM)front,weuseanewRBMmodelwithsuperioraccuracybyconditioningthevisibleunits(Sec.VI).The naladditionistheintroductionofanewblendingalgorithm,whichisbasedongradientboosteddecisiontrees(GBDT)(Sec.VII).

II.PRELIMINARIES

TheNet ixdatasetcontainsmorethan100milliondate-stampedmovieratingsperformedbyanonymousNet ixcus-tomersbetweenDec31,1999andDec31,2005[4].Thisdatasetgivesratingsaboutm=480,189usersandn=17,770movies(aka,items).

Thecontestwasdesignedinatraining-testsetformat.AHold-outsetofabout4.2millionratingswascreatedconsistingofthelastninemoviesratedbyeachuser(orfewerifauserhadnotratedatleast18moviesovertheentireperiod).Theremainingdatamadeupthetrainingset.TheHold-outsetwasrandomlysplitthreeways,intosubsetscalledProbe,Quiz,andTest.TheProbesetwasattachedtothetrainingset,andlabels(theratingthattheusergavethemovie)wereattached.TheQuizandTestsetsmadeupanevaluationset,whichisknownastheQualifyingset,thatcompetitorswererequiredtopredictratingsfor.Onceacompetitorsubmitspre-dictions,theprizemasterreturnstherootmeansquarederror(RMSE)achievedontheQuizset,whichispostedonapublicleaderboard(/leaderboard).RMSEvaluesmentionedinthisarticlecorrespondtotheQuizset.Ultimately,thewinneroftheprizeistheonethatscoresbestontheTestset,andthosescoreswereneverdisclosedbyNet ix.Thisprecludescleversystemswhichmight“game”thecompetitionbylearningabouttheQuizsetthroughrepeatedsubmissions.

Comparedwiththetrainingdata,theHold-outsetcontainsmanymoreratingsbyusersthatdonotratemuchandare

Y.KoreniswithYahoo!Research,

Haifa,

ISRAEL.

Email:yehuda@

thereforehardertopredict.Inaway,thisrepresentsrealrequirementsforacollaborative ltering(CF)system,whichneedstopredictnewratingsfromolderones,andtoequallyaddressallusers,notjusttheheavyraters.

Wereservespecialindexingletterstodistinguishusersfrommovies:forusersu,v,andformoviesi,j.Aratingruiindicatesthepreferencebyuseruofmoviei.Valuesarerangingfrom1(star)indicatingnointerestto5(stars)indicatingastronginterest.Wedistinguishpredictedratingsfromknownones,byusingthenotationr uiforthepredictedvalueofrui.

Thescalartuidenotesthetimeofratingrui.Here,timeismeasuredindays,sotuicountsthenumberofdayselapsedsincesomeearlytimepoint.About99%ofthepossibleratingsaremissing,becauseausertypicallyratesonlyasmallportionofthemovies.The(u,i)pairsforwhichruiisknownarestoredinthetrainingsetK={(u,i)|ruiisknown}.NoticethatKincludesalsotheProbeset.EachuseruisassociatedwithasetofitemsdenotedbyR(u),whichcontainsalltheitemsforwhichratingsbyuareavailable.Likewise,R(i)denotesthesetofuserswhorateditemi.Sometimes,wealsouseasetdenotedbyN(u),whichcontainsallitemsforwhichuprovidedarating,eveniftheratingvalueisunknown.Thus,N(u)extendsR(u)byalsoconsideringtheratingsintheQualifyingset.

Modelsfortheratingdataarelearnedby ttingthepre-viouslyobservedratings(trainingset).However,ourgoalistogeneralizethoseinawaythatallowsustopredictfuture,unknownratings(Qualifyingset).Thus,cautionshouldbeex-ercisedtoavoidover ttingtheobserveddata.Weachievethisbyregularizingthelearnedparameters,whosemagnitudesarepenalized.Theextentofregularizationiscontrolledbytunableconstants.Unlessotherwisestated,weuseL2regularization.Thisisagoodplacetoaddsomewordsontheconstantscontrollingouralgorithms(includingstepsizes,regularization,andnumberofiterations).ExactvaluesoftheseconstantsaredeterminedbyvalidationontheProbeset.Inallcasesbutone(tobementionedbelow),suchvalidationisdoneinamanualgreedymanner.Thatis,whenanewlyintroducedconstantneedstogettuned,weexecutemultiplerunsofthealgorithmsandpickthevaluethatyieldsthebestRMSEontheNet ixProbeset[4].Thisschemedoesnotresultinoptimalsettingsforseveralreasons.First,onceaconstantissetwedonotrevisititsvalue,eventhoughfutureintroductionofotherconstantsmayrequiremodifyingearliersettings.Second,weusethesameconstantsundermultiplevariantsofthesamealgorithm(e.g.,multipledimensionalitiesofafactorizationmodel),whereasamoredelicatetuningwouldrequireadifferentsettingforeachvariant.Wechosethisconvenient,butlessaccuratemethod,becauseourexperienceshowedthatovertuningtheaccuracyofasinglepredictordoes

推荐系统netflix获奖算法.doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:7 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:29元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219