推荐系统netflix获奖算法(3)

发布时间:2021-06-07

赢得netflix推荐系统大奖的算法

consideredforparameterizingtemporaluserbehavior,withvaryingcomplexityandaccuracy.

Onesimplemodelingchoiceusesalinearfunctiontocaptureapossiblegradualdriftofuserbias.Foreachuseru,wedenotethemeandateofratingbytu.Now,ifuratedamovieondayt,thentheassociatedtimedeviationofthisratingisde nedas

devu(t)=sign(t tu)·|t tu|β.

Here|t tu|measuresthenumberofdaysbetweendatestandtu.WesetthevalueofβbyvalidationontheProbeset;inourimplementationβ=0.4.Weintroduceasinglenewparameterforeachusercalledαusothatwegetour rstde nitionofatime-dependentuser-bias:

bu(1)

(t)=bu+αu·devu(t)

(7)

Thissimplelinearmodelforapproximatingadriftingbehaviorrequireslearningtwoparametersperuser:buandαu.

Thelinearfunctionformodelingtheuserbiasmesheswellwithgradualdriftsintheuserbehavior.However,wealsoobservesuddendriftsemergingas“spikes”associatedwithasingledayorsession.Forexample,wehavefoundthatmultipleratingsausergivesinasingleday,tendtoconcentratearoundasinglevalue.Suchaneffectneednotspanmorethanasingleday.Thismayre ectthemoodoftheuserthatday,theimpactofratingsgiveninasingledayoneachother,orchangesintheactualraterinmulti-personaccounts.Toaddresssuchshortlivedeffects,weassignasingleparameterperuserandday,absorbingtheday-speci cvariability.Thisparameterisdenotedbybut.

IntheNet ixdata,auserrateson40differentdaysonaverage.Thus,workingwithbutrequires,onaverage,40parameterstodescribeeachuserbias.Itisexpectedthatbutisinadequateasastandaloneforcapturingtheuserbias,sinceitmissesallsortsofsignalsthatspanmorethanasingleday.Thus,itservesasanadditivecomponentwithinthepreviouslydescribedschemes.Theuserbiasmodel(7)becomes

bu(3)

(t)=bu+αu·devu(t)+but.

(8)

Thediscussionsofarleadstothebaselinepredictorbui=µ+bu+αu·devu(tui)+bu,tui+bi+bi,Bin(tui).

(9)

Ifusedasastandalonepredictor,itsresultingRMSEwouldbe0.9605.

Anothereffectwithinthescopeofbaselinepredictorsisrelatedtothechangingscaleofuserratings.Whilebi(t)isauser-independentmeasureforthemeritofitemiattimet,userstendtorespondtosuchameasuredifferently.Forexample,differentusersemploydifferentratingscales,andasingleusercanchangehisratingscaleovertime.Accordingly,therawvalueofthemoviebiasisnotcompletelyuser-independent.Toaddressthis,weaddatime-dependentscalingfeaturetothebaselinepredictors,denotedbycu(t).Thus,thebaselinepredictor(9)becomes

bui=µ+bu+αu·devu(tui)+bu,tui+(bi+bi,Bin(tui))·cu(tui).

(10)

3

Alldiscussedwaystoimplementbu(t)wouldbevalidforimplementingcu(t)aswell.Wechosetodedicateaseparateparameterperday,resultingin:cu(t)=cu+cut.Asusual,cuisthestablepartofcu(t),whereascutrepresentsday-speci cvariability.

Addingthemultiplicativefactorcu(t)tothebaselinepre-dictor(asper(10))lowersRMSEto0.9555.Interestingly,thisbasicmodel,whichcapturesjustmaineffectsdisregardinguser-iteminteractions,canexplainalmostasmuchofthedatavariabilityasthecommercialNet ixCinematchrecommendersystem,whosepublishedRMSEonthesameQuizsetis0.9514[4].B.Frequencies

ItwasbroughttoourattentionbyourcolleaguesatthePragmaticTheoryteam(PT)thatthenumberofratingsausergaveonaspeci cdayexplainsasigni cantportionofthevariabilityofthedataduringthatday.Formally,denotebyFuitheoverallnumberofratingsthatuserugaveondaytui.ThevalueofFuiwillbehenceforthdubbeda“frequency”,followingPT’snotation.InpracticeweworkwitharoundedlogarithmofFui,denotedbyfui= logaFui .1

Interestingly,eventhoughfuiissolelydrivenbyuseru,itwillin uencetheitem-biases,ratherthantheuser-biases.Accordingly,foreachitemiweintroduceatermbif,capturingthebiasspeci cfortheitemiatlog-frequencyf.Baselinepredictor(10)isextendedtobe

bui=µ+bu+αu·devu(tui)+bu,tui+(bi+bi,Bin(tui))·cu(tui)+bi,fui.

(11)

Wenotethatitwouldbesensibletomultiplybi,fuibycu(tui),butwehavenotexperimentedwiththis.

Theeffectofaddingthefrequencytermtothemoviebiasisquitedramatic.RMSEdropsfrom0.9555to0.9278.Notably,itshowsabaselinepredictorwithapredictionaccuracysigni cantlybetterthanthatoftheoriginalNet ixCinematchalgorithm.

Here,itisimportanttoremindthatabaselinepredictor,nomatterhowaccurate,cannotyieldpersonalizedrecommenda-tionsonitsown,asitmissesallinteractionsbetweenusersanditems.Inasense,itiscapturingtheportionofthedatathatislessrelevantforestablishingrecommendationsandindoingsoenablesderivingaccuraterecommendationsbysubjectingothermodelstocleanerdata.Nonetheless,weincludedtwoofthemoreaccuratebaselinepredictorsinourblend.

Whyfrequencieswork?:Inordertograspthesourceoffrequenciescontribution,wemaketwoempiricalobservations.First,wecouldseethatfrequenciesareextremelypowerfulforastandalonebaselinepredictor,butaswewillsee,theycontributemuchlesswithinafullmethod,wheremostoftheirbene tdisappearswhenaddingtheuser-movieinteractionterms(matrixfactorizationorneighborhood).Secondisthefactthatfrequenciesseemtobemuchmorehelpfulwhenusedwithmoviebiases,butnotsowhenusedwithuser-relatedparameters.

1Notice

thatFuiisstrictlypositivewheneveritisused,sothelogarithmis

wellde ned.

推荐系统netflix获奖算法(3).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:7 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:29元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219