Clustering using firefly algorithm Performance study(3)
发布时间:2021-06-07
发布时间:2021-06-07
萤火虫算法
166J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171
whereKisthenumberofclusters,foragivennpatternxi(i=1,...,n)thelocationoftheithpatternandck(k=1,...,K)isthekthclustercenter,tobefoundbyEq.(6):ck=
xi
(6)
i∈Ck
nk
wherenkisthenumberofpatternsinthekthcluster.
Theclusteranalysisformstheassignmentofdatasetintoclusterssothatitcanbegroupedintosameclusterbasedonsomesimilaritymeasures[23].Distancemeasurementismostwidelyusedforevaluatingsimilaritiesbetweenpatterns.TheclustercentersarethedecisionvariableswhichareobtainedbyminimizingthesumofEuclideandistanceonalltrainingsetinstancesinthed-dimensionalspacebetweengenericinstancexiandthecenteroftheclusterck.Thecost(objective)functionforthepatterniisgivenbyEq.(7),asin[9,14]f
Train
i=
1
DDd(x,
CLknown(xj)jpTraini
)
(7)
j=1
whereDTrainisthenumberoftrainingdatasetwhichisusedtonormalizethesumthatwillrangeanydistancewithin[0.0,1.0]andpCLknown(xj)
todatabase.
idefinestheclassthatinstancebelongstoaccordingNotethatinourFAalgorithm,thedecisionvariablesaretheclustercenters.TheobjectivefunctioninourFAalgorithmisgivenbyEq.(7).Inourstudy,weconsiderthestandard13benchmarkproblemsgivenin[14].Foragivendataset,letnbethenumberofdatapoints,dbethedimension,cbethenumberofclasses.Agivendatapointbelongstoonlyoneofthesecclasses.Ofthegivendataset,75%ofthedatasetarerandomlyselectedtoobtaintheclustercentersusingEq.(7).Inthiswayweobtaintheclustercentersforallthecclasses.Theremaining25%ofdatasetisused(calledtestdataset)toobtaintheclassificationerrorpercentage(CEP).AnillustrativeexampleofthisFAalgorithmanditsperformancemeasure,isgiveninthenextsection.
4.Performancemeasuresandanillustrativeexample
Asdiscussedintheearliersection,ingtheseclustercenters,thetestingdatasetareclassifiedandtheperformanceofclassificationareanalyzed.
4.1.Performanceevaluation
TheperformanceoftheextractedknowledgeintheformofclustercentersbytheFAisevaluatedusingClassificationErrorPercentage(CEP)andclassificationefficiency.CEPdependsonlyontestdataandtheclassificationefficiencydependsonbothtrainingandtestingdata.
4.1.1.ClassificationErrorPercentage(CEP)
CEPisobtainedonlyusingthetestdata[9].Foreachproblem,wereporttheCEPwhichisthepercentageofincorrectlyclassifiedpatternsofthetestdatasetsasgivenin[9],tomakeareliablecomparison.
Theclassificationofeachpatternisdonebyassigningittotheclasswhosedistanceisclosesttothecenteroftheclusters.Then,theclassifiedoutputiscomparedwiththedesiredoutputandiftheyarenotexactlythesame,thepatternisseparatedasmisclassified[9].Thisprocedureisappliedtoalltestdataandthetotalmisclassifiedpatternnumberispercentagedtothesizeoftestdataset,whichisgivenbyCEP=
numberofmisclassifiedsamples
totalsizeoftestdataset
×100.
(8)
20
Class 2
15
y
training dataClass 1
testing data
10
5
0510
152025
x
Fig.1.Datadistribution.
4.1.2.Classificationefficiency
Classificationefficiencyisobtainedusingboththetrainingandtestdata.Theclassificationmatrixisusedtoobtainthestatisticalmeasuresfortheclass-levelperformance(individualefficiency)andtheglobalperformance(averageandoverallefficiency)oftheclassifier[24].Theindividualefficiencyisindicatedbythepercentageclassificationwhichtellsushowmanysamplesbelongingtoaparticularclasshavebeencorrectlyclassified.Thepercentageclassification(ηi)fortheclassciisgivenbyEq.(9).
ηii
i=
qn(9)
qji
j=1
whereqiiisthenumberofcorrectlyclassifiedsamplesandnisthenumberofsamplesfortheclassciinthedataset.Theglobalperformancemeasuresaretheaverage(ηa)andoverall(ηo)classification,whicharedefinedas
η1
nca=nηi
(10)
ci=1
η1
nco=
Nqii(11)i=1
wherencisthetotalnumberofclassesandNisthenumberofpatterns.
4.2.Illustrativeexample
WeillustratehowtheFireflyAlgorithm(FA)isusedforclusteringwiththefollowingsyntheticdata.Althoughtheproposedalgorithmcanbeusedforanytypeofmixturemodel,wefocusonaGaussianmixture.LetusconsidertwoGaussianmixturesthathavetwoinputfeatures,namelyxandy.Here,themeanvaluesµ1=[8,8]Tandµ2=[16,16]T,co-variancematrix(x,y)={(6,3);(3,2)}areassumedandeachclasshaveequalnumberofsamples.Inourexperimentation100samplesaregeneratedrandomlyforeachclass.Ofthese75datapointsareusedfortrainingandtheremaining25isusedfortestingineachclass.ThissyntheticdatageneratedisshowninFig.1.
Weusethefireflyalgorithmontrainingdatatoobtainclustercenters.Letxibeoneofthesolutions(clustercenters)andJibetheobjectivefunctionvalueforthisclustercenter.
Weconsiderapopulationsizeof5firefliesatlocationsx1,x2,x3,x4andx5within2d-dimensional,searchspace.NowevaluatethefitnessofthepopulationJ1,J2,J3J4,andJ5usingEq.(7)whichisdirectlyproportionaltolightintensityI1,I2,I3,I4andI5.Nowcomparetheintensityvaluesofafirefly,if(I2<I1)thenmovefirefly2toward1usingEq.(4),similarlycomparealltheagents