2024年3月16日发(作者:现代瑞风7座商务车)
Ef?cientEstimationofWordRepresentationsin
VectorSpace
TomasMikolovKaiChen
GoogleInc.,MountainView,CAGoogleInc.,MountainView,CA
tmikolov@chen@
GregCorradoJeffreyDean
GoogleInc.,MountainView,CAGoogleInc.,MountainView,CA
gcorrado@f@
Abstract
Weproposetwonovelmodelarchitecturesforcomputingcontinuousvectorrepre-
lityoftheserepresentations
ismeasuredinawordsimilaritytask,andtheresultsarecomparedtotheprevi-
ous
observelargeimprovementsinaccuracyatmuchlowercomputationalcost,
takeslessthanadaytolearnhighqualitywordvectorsfroma1.6billionwords
rmore,weshowthatthesevectorsprovidestate-of-the-artperfor-
manceonourtestsetformeasuringsyntacticandsemanticwordsimilarities.
1Introduction
ManycurrentNLPsystemsandtechniquestreatwordsasatomicunits-thereisnonotionofsimilar-
itybetweenwords,oicehasseveralgood
reasons-simplicity,robustnessandtheobservationthatsimplemodelstrainedonhugeamountsof
pleisthepopularN-grammodel
usedforstatisticallanguagemodeling-today,itispossibletotrainN-gramsonvirtuallyallavailable
data(trillionsofwords[3]).
However,mple,theamountof
relevantin-domaindataforautomaticspeechrecognitionislimited-theperformanceisusually
dominatedbythesizeofhighqualitytranscribedspeechdata(oftenjustmillionsofwords).In
machinetranslation,theexistingcorporaformanylanguagescontainonlyafewbillionsofwords
,weareinasituationwheresimplescalingupofthebasictechniqueswillnotresultin
anysigni?cantprogress,andwehavetofocusonmoreadvancedtechniques.
Withprogressofmachinelearningtechniquesinrecentyears,ithasbecomepossibletotrainmore
complexmodelsonmuchlargerdataset,ly
themostsuccessfulconceptistousedistributedrepresentationsofwords[10].Forexample,neural
networkbasedlanguagemodelssigni?cantlyoutperformN-grammodels[1,25,16].
1.1GoalsofthePaper
Themaingoalofthispaperistointroducetechniquesthatcanbeusedforlearninghigh-qualityword
vectorsfromhugedatasetswithbillionsofwords,
farasweknow,noneofthepreviouslyproposedarchitectureshasbeensuccessfullytrainedonmore
1
a
r
X
i
v
:
1
3
0
1
.
3
7
8
1
v
2
[
c
s
.
C
L
]
7
M
a
r
2
0
1
3
thanafewhundredofmillionsofwords,withamodestdimensionalityofthewordvectorsbetween
50-100.
Weuserecentlyproposedtechniquesformeasuringthequalityoftheresultingvectorrepresenta-
tions,withtheexpectationthatnotonlywillsimilarwordstendtobeclosetoeachother,butthat
wordscanhavemultipledegreesofsimilarity[19].Thishasbeenobservedearlierinthecontext
ofin?ectionallanguages-forexample,nounscanhavemultiplewordendings,andifwesearchfor
similarwordsinasubspaceoftheoriginalvectorspace,itispossibleto?ndwordsthathavesimilar
endings[12,13].
Somewhatsurprisingly,itwasfoundthatsimilarityofwordrepresentationsgoesbeyondsimple
wordoffsettechniquewheresimplealgebraicoperationsareper-
formedonthewordvectors,itwasshownforexamplethatvector(”King”)-vector(”Man”)+vec-
tor(”Woman”)resultsinavectorthatisclosesttothevectorrepresentationofthewordQueen[19].
Inthispaper,wetrytomaximizeaccuracyofthesevectoroperationsbydevelopingnewmodel
gnanewcomprehensivetest
setformeasuringbothsyntacticandsemanticregularities
1
,andshowthatmanysuchregularities
er,wediscusshowtrainingtimeandaccuracydepends
onthedimensionalityofthewordvectorsandontheamountofthetrainingdata.
1.2PreviousWork
令人奇怪的是
Representationofwordsascontinuousvectorshasalonghistory[10,24,8].Averypopularmodel
architectureforestimatingneuralnetworklanguagemodel(NNLM)wasproposedin[1],wherea
正反馈
feedforwardneuralnetworkwithalinearprojectionlayerandanon-linearhiddenlayerwasusedto
learrkhasbeen
followedbymanyothers.
AnotherinterestingarchitectureofNNLMwaspresentedin[12,13],wherethewordvectorsare
?dvectorsarethenusedto
,
work,wedirectlyextendthisarchitecture,andfocusjustonthe?rststepwherethewordvectorsare
learnedusingasimplemodel.
Itwaslatershownthatthewordvectorscanbeusedtosigni?cantlyimproveandsimplifymany
NLPapplications[4,5,26].Estimationofthewordvectorsitselfwasperformedusingdifferent
modelarchitecturesandtrainedonvariouscorpora[4,26,21,18,9],andsomeoftheresultingword
vectorsweremadeavailableforfutureresearchandcomparison
2
.However,asfarasweknow,these
architecturesweresigni?cantlymorecomputationallyexpensivefortrainingthantheoneproposed
in[12],withtheexceptionofcertainversionoflog-bilinearmodelwherediagonalweightmatrices
areused[21].
2ModelArchitectures
Manydifferenttypesofmodelswereproposedforestimatingcontinuousrepresentationsofwords,
includingthewell-knownLatentSemanticAnalysis(LSA)andLatentDirichletAllocation(LDA).
Inthispaper,wefocusondistributedrepresentationsofwordslearnedbyneuralnetworks,asitwas
previouslyshownthattheyperformsigni?cantlybetterthanLSAforpreservinglinearregularities
amongwords[19,28];LDAmoreoverbecomescomputationallyveryexpensiveonlargedatasets.
Similarto[17],tocomparedifferentmodelarchitectureswede?ne?rstthecomputationalcomplex-
ityofamodel,
wewilltrytomaximizetheaccuracy,whileminimizingthecomputationalcomplexity.
计算复杂性
/
?
imikolov/rnnlm/
/senna/
/projects/wordreprs/
/
?
imikolov/rnnlm/
/
?
ehhuang/
2
1
2
更多推荐
计算,复杂性
发布评论