2024年3月16日发(作者:二手景程为什么便宜)

Ef?cientEstimationofWordRepresentationsin

VectorSpace

TomasMikolovKaiChen

GoogleInc.,MountainView,CAGoogleInc.,MountainView,CA

tmikolov@chen@

GregCorradoJeffreyDean

GoogleInc.,MountainView,CAGoogleInc.,MountainView,CA

gcorrado@f@

Abstract

Weproposetwonovelmodelarchitecturesforcomputingcontinuousvectorrepre-

lityoftheserepresentations

ismeasuredinawordsimilaritytask,andtheresultsarecomparedtotheprevi-

ous

observelargeimprovementsinaccuracyatmuchlowercomputationalcost,

takeslessthanadaytolearnhighqualitywordvectorsfroma1.6billionwords

rmore,weshowthatthesevectorsprovidestate-of-the-artperfor-

manceonourtestsetformeasuringsyntacticandsemanticwordsimilarities.

1Introduction

ManycurrentNLPsystemsandtechniquestreatwordsasatomicunits-thereisnonotionofsimilar-

itybetweenwords,oicehasseveralgood

reasons-simplicity,robustnessandtheobservationthatsimplemodelstrainedonhugeamountsof

pleisthepopularN-grammodel

usedforstatisticallanguagemodeling-today,itispossibletotrainN-gramsonvirtuallyallavailable

data(trillionsofwords[3]).

However,mple,theamountof

relevantin-domaindataforautomaticspeechrecognitionislimited-theperformanceisusually

dominatedbythesizeofhighqualitytranscribedspeechdata(oftenjustmillionsofwords).In

machinetranslation,theexistingcorporaformanylanguagescontainonlyafewbillionsofwords

,weareinasituationwheresimplescalingupofthebasictechniqueswillnotresultin

anysigni?cantprogress,andwehavetofocusonmoreadvancedtechniques.

Withprogressofmachinelearningtechniquesinrecentyears,ithasbecomepossibletotrainmore

complexmodelsonmuchlargerdataset,ly

themostsuccessfulconceptistousedistributedrepresentationsofwords[10].Forexample,neural

networkbasedlanguagemodelssigni?cantlyoutperformN-grammodels[1,25,16].

1.1GoalsofthePaper

Themaingoalofthispaperistointroducetechniquesthatcanbeusedforlearninghigh-qualityword

vectorsfromhugedatasetswithbillionsofwords,

farasweknow,noneofthepreviouslyproposedarchitectureshasbeensuccessfullytrainedonmore

1

a

r

X

i

v

:

1

3

0

1

.

3

7

8

1

v

2

[

c

s

.

C

L

]

7

M

a

r

2

0

1

3

thanafewhundredofmillionsofwords,withamodestdimensionalityofthewordvectorsbetween

50-100.

Weuserecentlyproposedtechniquesformeasuringthequalityoftheresultingvectorrepresenta-

tions,withtheexpectationthatnotonlywillsimilarwordstendtobeclosetoeachother,butthat

wordscanhavemultipledegreesofsimilarity[19].Thishasbeenobservedearlierinthecontext

ofin?ectionallanguages-forexample,nounscanhavemultiplewordendings,andifwesearchfor

similarwordsinasubspaceoftheoriginalvectorspace,itispossibleto?ndwordsthathavesimilar

endings[12,13].

Somewhatsurprisingly,itwasfoundthatsimilarityofwordrepresentationsgoesbeyondsimple

wordoffsettechniquewheresimplealgebraicoperationsareper-

formedonthewordvectors,itwasshownforexamplethatvector(”King”)-vector(”Man”)+vec-

tor(”Woman”)resultsinavectorthatisclosesttothevectorrepresentationofthewordQueen[19].

Inthispaper,wetrytomaximizeaccuracyofthesevectoroperationsbydevelopingnewmodel

gnanewcomprehensivetest

setformeasuringbothsyntacticandsemanticregularities

1

,andshowthatmanysuchregularities

er,wediscusshowtrainingtimeandaccuracydepends

onthedimensionalityofthewordvectorsandontheamountofthetrainingdata.

1.2PreviousWork

令人奇怪的是

Representationofwordsascontinuousvectorshasalonghistory[10,24,8].Averypopularmodel

architectureforestimatingneuralnetworklanguagemodel(NNLM)wasproposedin[1],wherea

正反馈

feedforwardneuralnetworkwithalinearprojectionlayerandanon-linearhiddenlayerwasusedto

learrkhasbeen

followedbymanyothers.

AnotherinterestingarchitectureofNNLMwaspresentedin[12,13],wherethewordvectorsare

?dvectorsarethenusedto

,

work,wedirectlyextendthisarchitecture,andfocusjustonthe?rststepwherethewordvectorsare

learnedusingasimplemodel.

Itwaslatershownthatthewordvectorscanbeusedtosigni?cantlyimproveandsimplifymany

NLPapplications[4,5,26].Estimationofthewordvectorsitselfwasperformedusingdifferent

modelarchitecturesandtrainedonvariouscorpora[4,26,21,18,9],andsomeoftheresultingword

vectorsweremadeavailableforfutureresearchandcomparison

2

.However,asfarasweknow,these

architecturesweresigni?cantlymorecomputationallyexpensivefortrainingthantheoneproposed

in[12],withtheexceptionofcertainversionoflog-bilinearmodelwherediagonalweightmatrices

areused[21].

2ModelArchitectures

Manydifferenttypesofmodelswereproposedforestimatingcontinuousrepresentationsofwords,

includingthewell-knownLatentSemanticAnalysis(LSA)andLatentDirichletAllocation(LDA).

Inthispaper,wefocusondistributedrepresentationsofwordslearnedbyneuralnetworks,asitwas

previouslyshownthattheyperformsigni?cantlybetterthanLSAforpreservinglinearregularities

amongwords[19,28];LDAmoreoverbecomescomputationallyveryexpensiveonlargedatasets.

Similarto[17],tocomparedifferentmodelarchitectureswede?ne?rstthecomputationalcomplex-

ityofamodel,

wewilltrytomaximizetheaccuracy,whileminimizingthecomputationalcomplexity.

计算复杂性

/

?

imikolov/rnnlm/

/senna/

/projects/wordreprs/

/

?

imikolov/rnnlm/

/

?

ehhuang/

2

1

2

更多推荐

景程,计算,作者,复杂性