Efficient Estimation of Word Representations in Vec

2024年3月16日发(作者：现代瑞风7座商务车)

Ef?cientEstimationofWordRepresentationsin

VectorSpace

TomasMikolovKaiChen

GoogleInc.,MountainView,CAGoogleInc.,MountainView,CA

tmikolov@chen@

GregCorradoJeffreyDean

GoogleInc.,MountainView,CAGoogleInc.,MountainView,CA

gcorrado@f@

Abstract

Weproposetwonovelmodelarchitecturesforcomputingcontinuousvectorrepre-

lityoftheserepresentations

ismeasuredinawordsimilaritytask,andtheresultsarecomparedtotheprevi-

ous

observelargeimprovementsinaccuracyatmuchlowercomputationalcost,

takeslessthanadaytolearnhighqualitywordvectorsfroma1.6billionwords

rmore,weshowthatthesevectorsprovidestate-of-the-artperfor-

manceonourtestsetformeasuringsyntacticandsemanticwordsimilarities.

1Introduction

ManycurrentNLPsystemsandtechniquestreatwordsasatomicunits-thereisnonotionofsimilar-

itybetweenwords,oicehasseveralgood

reasons-simplicity,robustnessandtheobservationthatsimplemodelstrainedonhugeamountsof

pleisthepopularN-grammodel

usedforstatisticallanguagemodeling-today,itispossibletotrainN-gramsonvirtuallyallavailable

data(trillionsofwords[3]).

However,mple,theamountof

relevantin-domaindataforautomaticspeechrecognitionislimited-theperformanceisusually

dominatedbythesizeofhighqualitytranscribedspeechdata(oftenjustmillionsofwords).In

machinetranslation,theexistingcorporaformanylanguagescontainonlyafewbillionsofwords

,weareinasituationwheresimplescalingupofthebasictechniqueswillnotresultin

anysigni?cantprogress,andwehavetofocusonmoreadvancedtechniques.

Withprogressofmachinelearningtechniquesinrecentyears,ithasbecomepossibletotrainmore

complexmodelsonmuchlargerdataset,ly

themostsuccessfulconceptistousedistributedrepresentationsofwords[10].Forexample,neural

networkbasedlanguagemodelssigni?cantlyoutperformN-grammodels[1,25,16].

1.1GoalsofthePaper

Themaingoalofthispaperistointroducetechniquesthatcanbeusedforlearninghigh-qualityword

vectorsfromhugedatasetswithbillionsofwords,

farasweknow,noneofthepreviouslyproposedarchitectureshasbeensuccessfullytrainedonmore

[

]

thanafewhundredofmillionsofwords,withamodestdimensionalityofthewordvectorsbetween

50-100.

Weuserecentlyproposedtechniquesformeasuringthequalityoftheresultingvectorrepresenta-

tions,withtheexpectationthatnotonlywillsimilarwordstendtobeclosetoeachother,butthat

wordscanhavemultipledegreesofsimilarity[19].Thishasbeenobservedearlierinthecontext

ofin?ectionallanguages-forexample,nounscanhavemultiplewordendings,andifwesearchfor

similarwordsinasubspaceoftheoriginalvectorspace,itispossibleto?ndwordsthathavesimilar

endings[12,13].

Somewhatsurprisingly,itwasfoundthatsimilarityofwordrepresentationsgoesbeyondsimple

wordoffsettechniquewheresimplealgebraicoperationsareper-

formedonthewordvectors,itwasshownforexamplethatvector(”King”)-vector(”Man”)+vec-

tor(”Woman”)resultsinavectorthatisclosesttothevectorrepresentationofthewordQueen[19].

Inthispaper,wetrytomaximizeaccuracyofthesevectoroperationsbydevelopingnewmodel

gnanewcomprehensivetest

setformeasuringbothsyntacticandsemanticregularities

,andshowthatmanysuchregularities

er,wediscusshowtrainingtimeandaccuracydepends

onthedimensionalityofthewordvectorsandontheamountofthetrainingdata.

1.2PreviousWork

令人奇怪的是

Representationofwordsascontinuousvectorshasalonghistory[10,24,8].Averypopularmodel

architectureforestimatingneuralnetworklanguagemodel(NNLM)wasproposedin[1],wherea

正反馈

feedforwardneuralnetworkwithalinearprojectionlayerandanon-linearhiddenlayerwasusedto

learrkhasbeen

followedbymanyothers.

AnotherinterestingarchitectureofNNLMwaspresentedin[12,13],wherethewordvectorsare

?dvectorsarethenusedto

work,wedirectlyextendthisarchitecture,andfocusjustonthe?rststepwherethewordvectorsare

learnedusingasimplemodel.

Itwaslatershownthatthewordvectorscanbeusedtosigni?cantlyimproveandsimplifymany

NLPapplications[4,5,26].Estimationofthewordvectorsitselfwasperformedusingdifferent

modelarchitecturesandtrainedonvariouscorpora[4,26,21,18,9],andsomeoftheresultingword

vectorsweremadeavailableforfutureresearchandcomparison

.However,asfarasweknow,these

architecturesweresigni?cantlymorecomputationallyexpensivefortrainingthantheoneproposed

in[12],withtheexceptionofcertainversionoflog-bilinearmodelwherediagonalweightmatrices

areused[21].

2ModelArchitectures

Manydifferenttypesofmodelswereproposedforestimatingcontinuousrepresentationsofwords,

includingthewell-knownLatentSemanticAnalysis(LSA)andLatentDirichletAllocation(LDA).

Inthispaper,wefocusondistributedrepresentationsofwordslearnedbyneuralnetworks,asitwas

previouslyshownthattheyperformsigni?cantlybetterthanLSAforpreservinglinearregularities

amongwords[19,28];LDAmoreoverbecomescomputationallyveryexpensiveonlargedatasets.

Similarto[17],tocomparedifferentmodelarchitectureswede?ne?rstthecomputationalcomplex-

ityofamodel,

wewilltrytomaximizetheaccuracy,whileminimizingthecomputationalcomplexity.

计算复杂性

imikolov/rnnlm/

/senna/

/projects/wordreprs/

imikolov/rnnlm/

ehhuang/

更多推荐

计算,复杂性

Efficient Estimation of Word Representations in Vec

发布评论取消回复

最近发表

热门文章

标签列表

Efficient Estimation of Word Representations in Vec

相关文章

发布评论取消回复

最近发表

热门文章

标签列表