CN108170663A

CN108170663A - Term vector processing method, device and equipment based on cluster

Info

Publication number: CN108170663A
Application number: CN201711123278.8A
Authority: CN
Inventors: 曹绍升; 杨新星; 周俊
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2018-06-15
Also published as: TW201923620A; US10846483B2; EP3657360A1; SG11202002266YA; US20200167527A1; WO2019095836A1; EP3657360A4

Abstract

The embodiment of this specification discloses a cluster-based word vector processing method, device, and equipment. The scheme includes: the cluster includes a server cluster and a working machine cluster; each working machine in the working machine cluster reads part of the corpus, and reads the Extract words and their context words, obtain corresponding word vectors from servers in the server cluster and calculate gradients, and update the gradients to the server asynchronously; the server updates the word vectors of words and their context words according to the gradients.

Description

Cluster-based word vector processing method, device and equipment

技术领域technical field

本说明书涉及计算机软件技术领域，尤其涉及基于集群的词向量处理方法、装置以及设备。This description relates to the technical field of computer software, in particular to a cluster-based word vector processing method, device and equipment.

背景技术Background technique

如今的自然语言处理的解决方案，大都采用基于神经网络的架构，而在这种架构下一个重要的基础技术就是词向量。词向量是将词映射到一个固定维度的向量，该向量表征了该词的语义信息。Most of today's natural language processing solutions use a neural network-based architecture, and an important basic technology under this architecture is word vectors. A word vector is a vector that maps a word to a fixed dimension, which represents the semantic information of the word.

在现有技术中，常见的用于生成词向量的算法比如包括谷歌公司的单词向量算法、微软公司的深度神经网络算法等，往往在单机上运行。In the prior art, common algorithms for generating word vectors, such as Google's word vector algorithm and Microsoft's deep neural network algorithm, often run on a single machine.

基于现有技术，需要高效的大规模词向量训练方案。Based on the existing technology, an efficient large-scale word vector training scheme is needed.

发明内容Contents of the invention

本说明书实施例提供基于集群的词向量处理方法、装置以及设备，用以解决如下技术问题：需要高效的大规模词向量训练方案。The embodiments of this specification provide a cluster-based word vector processing method, device, and equipment to solve the following technical problem: an efficient large-scale word vector training solution is required.

为解决上述技术问题，本说明书实施例是这样实现的：In order to solve the above-mentioned technical problems, the embodiments of this specification are implemented as follows:

本说明书实施例提供的一种基于集群的词向量处理方法，所述集群包括多个工作机和服务器，所述方法包括：The embodiment of this specification provides a cluster-based word vector processing method, the cluster includes multiple working machines and servers, and the method includes:

各所述工作机分别执行：Each of the working machines executes respectively:

获取从部分语料中提取的词及其上下文词；Obtain words extracted from part of the corpus and their context words;

获取所述词及其上下文词的词向量；Obtain word vectors of the word and its context words;

根据所述词及其上下文词，以及对应的词向量，计算梯度；Calculate the gradient according to the word and its context word, and the corresponding word vector;

将所述梯度异步更新至所述服务器；asynchronously updating the gradient to the server;

所述服务器根据所述梯度，对所述词及其上下文词的词向量进行更新。The server updates word vectors of the word and its context words according to the gradient.

本说明书实施例提供的一种基于集群的词向量处理装置，所述集群包括多个工作机和服务器，所述装置位于所述集群，包括位于所述工作机的第一获取模块、第二获取模块、梯度计算模块、异步更新模块、位于所述服务器的词向量更新模块；The embodiment of this specification provides a cluster-based word vector processing device, the cluster includes multiple working machines and servers, the device is located in the cluster, and includes a first acquisition module and a second acquisition module located in the working machines module, a gradient calculation module, an asynchronous update module, and a word vector update module located at the server;

各工作机通过相应的模块分别执行：Each working machine executes respectively through corresponding modules:

所述第一获取模块获取从部分语料中提取的词及其上下文词；The first acquisition module acquires words extracted from part of the corpus and context words thereof;

所述第二获取模块获取所述词及其上下文词的词向量；The second obtaining module obtains the word vector of the word and its context word;

所述梯度计算模块根据所述词及其上下文词，以及对应的词向量，计算梯度；The gradient calculation module calculates the gradient according to the word and its context words, and the corresponding word vector;

所述异步更新模块将所述梯度异步更新至所述服务器；The asynchronous update module asynchronously updates the gradient to the server;

所述服务器的所述词向量更新模块根据所述梯度，对所述词及其上下文词的词向量进行更新。The word vector updating module of the server updates the word vectors of the word and its context words according to the gradient.

本说明书实施例提供的一种基于集群的词向量处理设备，所述设备属于所述集群，包括：A cluster-based word vector processing device provided in an embodiment of this specification, the device belonging to the cluster, includes:

至少一个处理器；以及，at least one processor; and,

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively coupled to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够：The memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:

将所述梯度异步更新；updating the gradient asynchronously;

根据异步更新的梯度，对所述词及其上下文词的词向量进行更新。The word vectors of the word and its context words are updated according to the asynchronously updated gradient.

本说明书实施例采用的上述至少一个技术方案能够达到以下有益效果：在训练过程中，各工作机而无需相互等待，向服务器异步更新针对各词计算出的梯度，进而由服务器根据梯度更新各词的词向量，因此，有利于提高词向量训练收敛速度，再加上集群的分布式处理能力，使得该方案能够适用于大规模词向量训练且效率较高。The above-mentioned at least one technical solution adopted in the embodiment of this specification can achieve the following beneficial effects: in the training process, each working machine does not need to wait for each other, and asynchronously updates the gradient calculated for each word to the server, and then the server updates each word according to the gradient. Therefore, it is beneficial to improve the convergence speed of word vector training, coupled with the distributed processing capability of the cluster, making this scheme suitable for large-scale word vector training with high efficiency.

附图说明Description of drawings

为了更清楚地说明本说明书实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本说明书中记载的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of this specification or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments described in this specification. Those skilled in the art can also obtain other drawings based on these drawings without any creative effort.

图1为本说明书的方案在一种实际应用场景下涉及的一种整体架构示意图；FIG. 1 is a schematic diagram of an overall architecture involved in a practical application scenario of the solution in this specification;

图2为本说明书实施例提供的一种基于集群的词向量处理方法的流程示意图；Fig. 2 is a schematic flow chart of a cluster-based word vector processing method provided by an embodiment of this specification;

图3为本说明书实施例提供的一种实际应用场景下，基于集群的词向量处理方法的原理示意图；FIG. 3 is a schematic diagram of the principle of a cluster-based word vector processing method in an actual application scenario provided by an embodiment of this specification;

图4为本说明书实施例提供的对应于图3的一种基于集群的词向量处理方法的详细流程示意图；Fig. 4 is a detailed flowchart of a cluster-based word vector processing method corresponding to Fig. 3 provided by the embodiment of this specification;

图5为本说明书实施例提供的对应于图2的一种基于集群的词向量处理装置的结构示意图。FIG. 5 is a schematic structural diagram of a cluster-based word vector processing device corresponding to FIG. 2 provided by the embodiment of this specification.

具体实施方式Detailed ways

本说明书实施例提供基于集群的词向量处理方法、装置以及设备。The embodiments of this specification provide a cluster-based word vector processing method, device and equipment.

为了使本技术领域的人员更好地理解本说明书中的技术方案，下面将结合本说明书实施例中的附图，对本说明书实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本说明书实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the drawings in the embodiments of this specification. Obviously, the described The embodiments are only some of the embodiments of the present application, but not all of them. Based on the embodiments of this specification, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

本说明书的方案适用于集群，在集群下对于大规模词向量的处理效率更高，具体地：可以拆分训练语料，集群中的多个工作机分布式地分别根据拆分的部分语料，配合一个或者多个服务器训练所述部分语料对应的词向量，在训练过程中，各工作机负责计算各词对应的梯度，并异步更新至服务器，服务器负责根据梯度更新词向量。The solution in this manual is suitable for clusters, and the processing efficiency of large-scale word vectors is higher in clusters. Specifically: the training corpus can be split, and multiple working machines in the cluster are distributed according to the split part of the corpus. One or more servers train the word vectors corresponding to the part of the corpus. During the training process, each working machine is responsible for calculating the gradient corresponding to each word, and asynchronously updates to the server, and the server is responsible for updating the word vector according to the gradient.

方案涉及的集群可以有一个或者多个，以图1为例，涉及了两个集群。There can be one or more clusters involved in the solution. Taking Figure 1 as an example, two clusters are involved.

图1为本说明书的方案在一种实际应用场景下涉及的一种整体架构示意图。该整体架构中，主要涉及三部分：包含多个服务器的服务器集群、包含多个工作机的工作机集群、数据库。数据库保存有用于训练的语料，供工作机集群读取，服务器集群保存原始的词向量，工作机集群与服务器集群进行配合，通过异步更新梯度，实现对词向量的训练。FIG. 1 is a schematic diagram of an overall architecture involved in the solution of this specification in a practical application scenario. The overall architecture mainly involves three parts: a server cluster including multiple servers, a working machine cluster including multiple working machines, and a database. The database saves the corpus for training, which is read by the worker cluster, and the server cluster saves the original word vectors. The worker cluster and the server cluster cooperate to realize the training of word vectors by updating the gradient asynchronously.

图1中的架构是示例性的，并非唯一。比如，方案也可以只涉及一个集群，该集群中包含至少一个调度机和多个工作机，由调度机完成上述服务器集群的工作；再比如，方案也可以涉及一个工作机集群和一个服务器；等等。The architecture in Figure 1 is exemplary and not exclusive. For example, the solution may only involve one cluster, and the cluster includes at least one scheduler and multiple worker machines, and the scheduler completes the work of the above server cluster; another example, the solution may also involve a worker cluster and a server; etc. Wait.

下面基于图1中的架构，对本说明书的方案进行详细说明。Based on the architecture in FIG. 1 , the solution of this specification will be described in detail below.

图2为本说明书实施例提供的一种基于集群的词向量处理方法的流程示意图，所述集群包括工作机集群和服务器集群。图2中各步骤由集群中的至少一个机器(或者机器上的程序)执行，不同步骤的执行主体可以不同，图2中的流程可以执行多轮，每轮可以使用不同组的语料，语料用于训练词向量。FIG. 2 is a schematic flowchart of a cluster-based word vector processing method provided by an embodiment of this specification, and the cluster includes a working machine cluster and a server cluster. Each step in Figure 2 is executed by at least one machine (or program on the machine) in the cluster, and the execution subjects of different steps can be different. The process in Figure 2 can be executed in multiple rounds, and each round can use different groups of corpus. for training word vectors.

图2中的流程包括以下步骤：The flow in Figure 2 includes the following steps:

S202：工作机集群包含的各工作机分别获取从部分语料中提取的词及其上下文词。S202: Each working machine included in the working machine cluster respectively obtains words extracted from part of the corpus and context words thereof.

S204：所述工作机获取所述词及其上下文词的词向量。S204: The working machine acquires word vectors of the word and its context words.

S206：所述工作机根据所述词及其上下文词，以及对应的词向量，计算梯度。S206: The working machine calculates a gradient according to the word, its context word, and the corresponding word vector.

S208：所述工作机将所述梯度异步更新至服务器集群包含的服务器。S208: The working machine asynchronously updates the gradient to the servers included in the server cluster.

S210：所述服务器根据所述梯度，对所述词及其上下文词的词向量进行更新。S210: The server updates word vectors of the word and its context words according to the gradient.

在本说明书实施例中，各工作机可以分布式地执行步骤S202～S208，各工作机对应的部分语料通常是不同的，如此能够高效利用大规模的训练语料，也能够提高词向量的训练效率。比如，对于当前用于训练词向量的语料，可以将语料拆分为多份，各工作机可以分别读取一部分，进而基于自己读取的部分语料执行步骤S202～S208。In the embodiment of this specification, each working machine can perform steps S202-S208 in a distributed manner, and the part of the corpus corresponding to each working machine is usually different, so that large-scale training corpus can be efficiently used, and the training efficiency of word vectors can also be improved. . For example, for the corpus currently used for training word vectors, the corpus can be split into multiple parts, and each working machine can read a part, and then perform steps S202-S208 based on the part of the corpus it reads.

为了便于描述，对于步骤S202～S208，以下各实施例主要从某一个工作机的角度进行说明。For ease of description, for steps S202 to S208, the following embodiments are mainly described from the perspective of a certain working machine.

在本说明书实施例中，若本轮流程是第一轮流程，步骤S204中获取的词向量可以是初始化得到的。比如，可以采用随机初始化的方式或者按照指定概率分布初始化的方式，初始化各词的词向量，以及各词的上下文词的词向量，指定概率分布比如是0-1分布等。而若本轮流程并非第一轮流程，则步骤S204中获取的词向量可以是上轮流程执行完毕后更新并保存的词向量。In the embodiment of this specification, if the current round of process is the first round of process, the word vector acquired in step S204 may be obtained by initialization. For example, the word vector of each word and the word vector of the context word of each word can be initialized randomly or according to a specified probability distribution, and the specified probability distribution is, for example, 0-1 distribution. If the current round of the process is not the first round of the process, the word vector obtained in step S204 may be the word vector updated and saved after the execution of the previous round of the process.

在本说明书实施例中，训练词向量的过程主要包括计算梯度以及根据梯度更新向量，分别由工作机集群和服务器集群执行。在训练过程中，工作机计算完成后，需要将结果同步到服务器，通常有两种模式：同步更新与异步更新。同步更新是指：各工作机采用某种方式进行模型平均后再更新至服务器(一般地，不同的平均策略会造成不同的结果，模型平均的策略设计是同步更新重要的一环)。而异步更新是指任一个工作机计算完成就立即向服务器更新数据，而不等待其他工作机更不用进行模型平均。从最终效果上讲，异步更新由于不需要等待其他工作机计算完成，因此训练收敛速度往往更快，本说明书的方案主要基于异步更新的方式进行说明，具体异步更新的数据包括由工作机计算的各词对应的梯度。In the embodiment of this specification, the process of training word vectors mainly includes calculating gradients and updating vectors according to the gradients, which are respectively executed by worker clusters and server clusters. During the training process, after the calculation of the working computer is completed, the result needs to be synchronized to the server. There are usually two modes: synchronous update and asynchronous update. Synchronous update means that each working machine averages the model in a certain way and then updates it to the server (generally, different averaging strategies will result in different results, and the strategy design of model averaging is an important part of synchronous update). The asynchronous update means that any worker computer updates the data to the server immediately after the calculation is completed, without waiting for other workers and model averaging. In terms of the final effect, asynchronous update does not need to wait for other working computers to complete the calculation, so the training convergence speed is often faster. The solution in this manual is mainly based on the asynchronous update method. The specific asynchronous update data includes the data calculated by the working computer. The gradient corresponding to each word.

在本说明书实施例中，步骤S208由服务器集群执行，更新后的词向量也保存于服务器集群，以便下轮流程使用。当然，在图1以外的其他架构中，步骤S208也可以由与工作机属于同一集群的调度机或服务器执行。In the embodiment of this specification, step S208 is executed by the server cluster, and the updated word vectors are also stored in the server cluster for use in the next round of processes. Certainly, in other architectures than those shown in FIG. 1 , step S208 may also be executed by a scheduler or server belonging to the same cluster as the worker.

以此类推，进行多轮流程直至所有组的训练语料全部使用完毕后，服务器集群可以将最终更新得到的词向量写出到数据库，以便用于需求词向量的各种场景。By analogy, after performing multiple rounds of the process until all the training corpora of all groups are used up, the server cluster can write out the finally updated word vectors to the database for use in various scenarios that require word vectors.

通过图2的方法，在训练过程中，各工作机而无需相互等待，向服务器异步更新针对各词计算出的梯度，进而由服务器根据梯度更新各词的词向量，因此，有利于提高词向量训练收敛速度，再加上集群的分布式处理能力，使得该方案能够适用于大规模词向量训练且效率较高。Through the method in Figure 2, during the training process, each working machine does not need to wait for each other, and asynchronously updates the gradient calculated for each word to the server, and then the server updates the word vector of each word according to the gradient. Therefore, it is beneficial to improve the word vector The speed of training convergence, coupled with the distributed processing capability of the cluster, makes this scheme suitable for large-scale word vector training with high efficiency.

基于图2的方法，本说明书实施例还提供了该方法的一些具体实施方案，以及扩展方案，继续基于图1中的架构进行说明。Based on the method in FIG. 2 , the embodiment of this specification also provides some specific implementation solutions and extension solutions of the method, and the description will continue based on the architecture in FIG. 1 .

在本说明书实施例中，从语料中提取词及其上下文词可以由工作机执行，也可以由其他设备预先执行。以前一种方式为例，则对于步骤S202，所述获取从部分语料中提取的词及其上下文词前，还可以执行：各所述工作机分布式地读取得到部分语料。语料若保存于数据库，则可以从数据库读取。In the embodiment of this specification, the extraction of words and their context words from the corpus may be performed by a working machine, or may be performed in advance by other devices. Taking the previous method as an example, for step S202, before the acquisition of the words extracted from the partial corpus and their context words, it may also be performed: each of the working machines reads and obtains the partial corpus in a distributed manner. If the corpus is saved in the database, it can be read from the database.

在本说明书实施例中，所述获取从部分语料中提取的词及其上下文词，具体可以包括：根据自己所读取得到的语料，建立相应的词对，所述词对包含当前词及其上下词。比如，可以扫描自己所读取得到的语料中的词，当前扫描的词为当前词记作w，根据设定的滑窗距离确定包含w的一个滑窗，将该滑窗内的其他每个词分别作为w的一个上下文词，记作c，如此构成词对{w,c}。In the embodiment of this specification, the acquisition of words extracted from part of the corpus and their context words may specifically include: establishing corresponding word pairs based on the corpus that you have read, and the word pairs include the current word and its context words. Upper and lower words. For example, you can scan the words in the corpus you have read. The currently scanned word is recorded as w, and a sliding window containing w is determined according to the set sliding window distance, and each other in the sliding window is Words are respectively used as a context word of w, denoted as c, thus forming a word pair {w,c}.

进一步地，假定词向量保存于服务器集群包含的多个服务器上。则对于步骤S204，所述获取所述词及其上下文词的词向量，具体可以包括：根据自己建立的各所述词对，提取得到当前词集合和上下文词集合；从所述服务器获取所述当前词集合和上下文词集合包含的词的词向量。当然，这并非唯一实施方式，比如，也可以在扫描语料时，同步地从服务器获取当前扫描到的词的词向量而未必要依赖于建立的词对，等等。Further, it is assumed that word vectors are stored on multiple servers included in the server cluster. Then for step S204, the acquisition of the word vector of the word and its context word may specifically include: extracting the current word set and the context word set according to each of the word pairs established by oneself; obtaining the word vector from the server Word vectors of the words contained in the current word set and the context word set. Of course, this is not the only implementation mode. For example, when scanning the corpus, the word vector of the currently scanned word can be obtained from the server synchronously without necessarily relying on the established word pair, and so on.

在本说明书实施例中，可以根据指定的损失函数，自己建立的各所述词对，以及所述词及其上下文词的词向量，计算各词分别对应的梯度。In the embodiment of this specification, the gradient corresponding to each word can be calculated according to the specified loss function, each word pair created by itself, and the word vector of the word and its context word.

为了获得更好的训练效果以及更快地收敛，还可以引入指定的负样例词作为上下文词的对照计算梯度，负样例词被视为：相比于上下文词，与对应的当前词相关性相对低的词，一般可以在全部词中随机选择若干个。在这种情况下，对于步骤S206，所述根据所述词及其上下文词，以及对应的词向量，计算梯度，具体可以包括：根据指定的损失函数、负样例词、自己建立的各所述词对，以及所述词及其上下文词的词向量，计算各词分别对应的梯度。In order to obtain better training effect and faster convergence, the specified negative sample words can also be introduced as the contrast calculation gradient of the context words. The negative sample words are regarded as: compared with the context words, they are related to the corresponding current words Generally speaking, several words can be randomly selected from all words. In this case, for step S206, the calculation of the gradient according to the word and its context word, and the corresponding word vector may specifically include: according to the specified loss function, negative sample words, and each established The predicate pair, and the word vectors of the word and its context word, calculate the gradient corresponding to each word.

当前词及其每个负样例词也可以构成一个词对(称为负样例词对)，用c'表示负样例词，负样例词对记作{w,c’}，假定有λ个负样例词，相应的λ个负样例词对可以记作{w,c’₁}、{w,c’₂}、…、{w,c’_λ}，为了便于描述将负样例词对和上面的上下文词对(当前词及其上下文词构成的词对)统一记作{w,c}，并用y来区分，对于上下文词对，y＝1，对于负样例词对，y＝0。The current word and each of its negative sample words can also form a word pair (called a negative sample word pair), c' is used to represent a negative sample word, and the negative sample word pair is denoted as {w,c'}, assuming There are λ negative sample words, and the corresponding λ negative sample word pairs can be recorded as {w,c' ₁ }, {w,c' ₂ }, ..., {w,c' _λ }, for the convenience of description, The negative example word pair and the above context word pair (the word pair formed by the current word and its context word) are collectively recorded as {w,c}, and are distinguished by y. For the context word pair, y=1, for the negative example word pair, y=0.

在本发明实施例中，上述的损失函数可以有多种形式，一般包含至少两项，一项反映当前词与其上下文之间的相似度，另一项反映当前词与其负样例词之间的相似度，其中，可以用向量点乘度量相似度，也可以采用其他方式度量相似度。以一种实际应用场景为例，比如利用以下公式计算当前词对应的梯度▽：In the embodiment of the present invention, the above-mentioned loss function can have various forms, generally including at least two items, one item reflects the similarity between the current word and its context, and the other item reflects the similarity between the current word and its negative sample word Similarity, where the similarity can be measured by vector dot product, or can be measured in other ways. Taking a practical application scenario as an example, for example, use the following formula to calculate the gradient ▽ corresponding to the current word:

其中，表示w的词向量，表示c的词向量，σ是激活函数，假定为Sigmoid函数，则 in, Represents the word vector of w, Represents the word vector of c, σ is the activation function, assumed to be a Sigmoid function, then

进一步地，每个工作机上的一个或者多个线程可以以异步计算且不加锁更新的方式，计算梯度。从而，工作机内各线程也可以并行计算梯度且不会相互妨碍，能够进一步地提高计算效率。Furthermore, one or more threads on each worker machine can calculate gradients in an asynchronous and update-free manner. Therefore, each thread in the working machine can also calculate the gradient in parallel without interfering with each other, which can further improve the calculation efficiency.

在本说明书实施例中，对于步骤S208，所述工作机将所述梯度异步更新至所述服务器，具体可以包括：所述工作机计算得到所述梯度后，将所述梯度发送给所述服务器，其中，所述发送动作的执行无需等待其他工作机向所述服务器发送梯度。In the embodiment of this specification, for step S208, the working machine asynchronously updates the gradient to the server, which may specifically include: after the working machine calculates the gradient, sends the gradient to the server , wherein the execution of the sending action does not need to wait for other working machines to send gradients to the server.

在本说明书实施例中，服务器获得工作机异步更新的梯度后，可以利用该梯度更新对应的当前词的词向量。不仅如此，服务器还可以利用该梯度，更新当前词的上下文词以及负样例词的词向量，具体的更新方式可以参照梯度下降法进行。In the embodiment of this specification, after the server obtains the gradient updated asynchronously by the working machine, it can use the gradient to update the corresponding word vector of the current word. Not only that, the server can also use the gradient to update the context word of the current word and the word vector of the negative sample word. The specific update method can refer to the gradient descent method.

例如，对于步骤S210，所述服务器根据所述梯度，对所述词及其上下文词的词向量进行更新，具体可以包括：For example, for step S210, the server updates the word vectors of the word and its context words according to the gradient, which may specifically include:

按照以下公式，对所述词及其上下文词，以及所述负样例词的词向量进行迭代更新：According to the following formula, the word and its context word, and the word vector of the negative sample word are iteratively updated:

其中，w表示当前词，c表示w的上下文词，c'表示负样例词，表示w的词向量，表示c的词向量，和表示在所述服务器上的第t次更新，B_k表示所述工作机上第k组语料，Γ(w)表示w的上下文词和负样例词的集合，α表示学习率，σ比如为Sigmoid函数。in, w represents the current word, c represents the context word of w, c' represents the negative sample word, Represents the word vector of w, Represents the word vector of c, and Represents the tth update on the server, B _k represents the kth group of corpus on the working machine, Γ(w) represents the set of context words and negative sample words of w, α represents the learning rate, and σ is, for example, Sigmoid function.

根据上面的说明，本说明书实施例还提供了一种实际应用场景下，基于集群的词向量处理方法的原理示意图，如图3所示，进一步地，本说明书实施例还提供了对应于图3的一种基于集群的词向量处理方法的详细流程示意图，如图4所示。According to the above description, the embodiment of this specification also provides a schematic diagram of the principle of a cluster-based word vector processing method in an actual application scenario, as shown in Figure 3. Further, the embodiment of this specification also provides A detailed flowchart of a cluster-based word vector processing method, as shown in Figure 4.

在图3中，示例性地示出了工作机0～2、服务器0～2，主要针对工作机0进行说明，而工作机1和2简略地进行了表示，工作方式与工作机0是一致的。“wid”、“cid”为标识，分别表示当前词和上下文词，“wid list”、“cid list”是标识列表，分别表示当前词集合和上下文词集合。图3中的简略工作流程包括：各工作机分布式地读取语料，建立词对；各工作机从服务器集群获取相应的词向量；各工作机利用读取的语料计算梯度并异步更新至服务器集群；服务器集群根据梯度更新词向量。In Fig. 3, the working machines 0-2 and the servers 0-2 are exemplarily shown, and the description mainly focuses on the working machine 0, while the working machines 1 and 2 are briefly shown, and the working mode is the same as that of the working machine 0. of. "wid" and "cid" are identifiers, representing the current word and the context word respectively, and "wid list" and "cid list" are identifier lists, representing the current word set and the context word set respectively. The simplified workflow in Figure 3 includes: each working machine reads the corpus in a distributed manner and creates word pairs; each working machine obtains the corresponding word vector from the server cluster; each working machine calculates the gradient using the read corpus and updates it to the server asynchronously Cluster; the server cluster updates word vectors according to the gradient.

图4中示出了更详细的流程，主要包括以下步骤：A more detailed process is shown in Figure 4, which mainly includes the following steps:

S402：各工作机分布式地读取部分语料，建立词对{w,c}，从词对中提取wid list和cid list，如图4中的工作机0所示。S402: Each working machine reads part of the corpus in a distributed manner, creates word pairs {w, c}, and extracts wid list and cid list from the word pairs, as shown by working machine 0 in FIG. 4 .

S404：工作机根据wid list和cid list从服务器拉取对应的词向量，服务器发送对应的词向量给工作机。S404: The working machine pulls the corresponding word vector from the server according to the wid list and the cid list, and the server sends the corresponding word vector to the working machine.

S406：工作机根据词对和对应的词向量，计算梯度，具体采用上述的公式一进行计算。S406: The working machine calculates the gradient according to the word pairs and the corresponding word vectors, specifically using the above formula 1 for calculation.

S408：工作机的每个线程均以异步计算且不加锁更新的方式，计算梯度，完成梯度计算后，不等待其他工作机，直接将计算出的该工作机上所有词对应的梯度传给服务器。S408: Each thread of the working machine calculates the gradient in an asynchronous manner without locking and updating. After completing the gradient calculation, it does not wait for other working machines, and directly transmits the calculated gradients corresponding to all the words on the working machine to the server. .

S410：服务器集群根据梯度更新词向量，具体采用上述的公式二和公式三进行计算。S410: The server cluster updates the word vector according to the gradient, specifically using the above formula 2 and formula 3 for calculation.

基于同样的思路，本说明书实施例还提供了上述方法的对应装置，如图5所示。Based on the same idea, the embodiment of this specification also provides a corresponding device of the above method, as shown in FIG. 5 .

图5为本说明书实施例提供的对应于图2的一种基于集群的词向量处理装置的结构示意图，所述集群包括多个工作机和服务器，所述装置位于所述集群，包括位于所述工作机的第一获取模块501、第二获取模块502、梯度计算模块503、异步更新模块504、位于所述服务器的词向量更新模块505；Fig. 5 is a schematic structural diagram of a cluster-based word vector processing device corresponding to Fig. 2 provided by the embodiment of this specification. The cluster includes a plurality of working machines and servers, and the device is located in the cluster, including the The first acquisition module 501 of the working machine, the second acquisition module 502, the gradient calculation module 503, the asynchronous update module 504, and the word vector update module 505 located at the server;

所述第一获取模块501获取从部分语料中提取的词及其上下文词；The first obtaining module 501 obtains words and context words thereof extracted from part of the corpus;

所述第二获取模块502获取所述词及其上下文词的词向量；The second obtaining module 502 obtains the word vector of the word and its context word;

所述梯度计算模块503根据所述词及其上下文词，以及对应的词向量，计算梯度；The gradient calculation module 503 calculates the gradient according to the word and its context words, and the corresponding word vector;

所述异步更新模块504将所述梯度异步更新至所述服务器；The asynchronous update module 504 asynchronously updates the gradient to the server;

所述服务器的所述词向量更新模块505根据所述梯度，对所述词及其上下文词的词向量进行更新。The word vector updating module 505 of the server updates the word vectors of the word and its context words according to the gradient.

可选地，所述第一获取模块501获取从部分语料中提取的词及其上下文词前，分布式地读取得到部分语料；Optionally, before the first acquisition module 501 acquires the words extracted from the partial corpus and their context words, read the partial corpus in a distributed manner;

所述第一获取模块501获取从部分语料中提取的词及其上下文词，具体包括：The first acquiring module 501 acquires words and context words thereof extracted from part of the corpus, specifically including:

所述第一获取模块501根据自己所读取得到的语料，建立相应的词对，所述词对包含当前词及其上下词。The first acquisition module 501 creates corresponding word pairs according to the corpus it has read, and the word pairs include the current word and its context words.

可选地，所述第二获取模块502获取所述词及其上下文词的词向量，具体包括：Optionally, the second acquiring module 502 acquires word vectors of the word and its context words, specifically including:

所述第二获取模块502根据所述第一获取模块501建立的各所述词对，提取得到当前词集合和上下文词集合；The second acquisition module 502 extracts the current word set and the context word set according to each of the word pairs established by the first acquisition module 501;

从所述服务器获取所述当前词集合和上下文词集合包含的词的词向量。Obtain word vectors of words contained in the current word set and the context word set from the server.

可选地，所述梯度计算模块503根据所述词及其上下文词，以及对应的词向量，计算梯度，具体包括：Optionally, the gradient calculation module 503 calculates the gradient according to the word and its context words, and the corresponding word vector, specifically including:

所述梯度计算模块503根据指定的损失函数、负样例词、自己建立的各所述词对，以及所述词及其上下文词的词向量，计算各词分别对应的梯度。The gradient calculation module 503 calculates the gradient corresponding to each word according to the specified loss function, negative sample words, each word pair created by itself, and the word vector of the word and its context word.

可选地，所述梯度计算模块503计算梯度，具体包括：Optionally, the gradient calculation module 503 calculates the gradient, specifically including:

所述梯度计算模块503的一个或者多个线程以异步计算且不加锁更新的方式，计算梯度。One or more threads of the gradient calculation module 503 calculate gradients in a manner of asynchronous calculation and update without locking.

可选地，所述异步更新模块504将所述梯度异步更新至所述服务器，具体包括：Optionally, the asynchronous update module 504 asynchronously updates the gradient to the server, specifically including:

所述异步更新模块504在所述梯度计算模块503计算得到所述梯度后，将所述梯度发送给所述服务器，其中，所述发送动作的执行无需等待其他工作机的异步更新模块504向所述服务器发送梯度。The asynchronous update module 504 sends the gradient to the server after the gradient calculation module 503 calculates the gradient, wherein the execution of the sending action does not need to wait for the asynchronous update module 504 of other working machines to send the gradient to the server. The server sends gradients.

可选地，所述词向量更新模块505根据所述梯度，对所述词及其上下文词的词向量进行更新，具体包括：Optionally, the word vector update module 505 updates the word vectors of the word and its context words according to the gradient, specifically including:

所述词向量更新模块505按照以下公式，对所述词及其上下文词，以及所述负样例词的词向量进行迭代更新：The word vector update module 505 iteratively updates the word and its context words, and the word vector of the negative example word according to the following formula:

其中，w表示当前词，c表示w的上下文in, w represents the current word, c represents the context of w

词，c'表示负样例词，表示w的词向量，表示c的词向量，和表示word, c' represents a negative sample word, Represents the word vector of w, Represents the word vector of c, and express

在所述服务器上的第t次更新，B_k表示所述工作机上第k组语料，Γ(w)表In the tth update on the server, B _k represents the kth group of corpus on the working machine, Γ(w) table

示w的上下文词和负样例词的集合，α表示学习率，σ为Sigmoid函数。Shows the set of context words and negative sample words of w, α represents the learning rate, and σ is the Sigmoid function.

基于同样的思路，本说明书实施例还提供了对应于图2的一种基于集群的词向量处理设备，该设备属于所述集群，包括：Based on the same idea, the embodiment of this specification also provides a cluster-based word vector processing device corresponding to Figure 2, which belongs to the cluster, including:

至少一个处理器；以及，at least one processor; and,

将所述梯度异步更新；updating the gradient asynchronously;

基于同样的思路，本说明书实施例还提供了对应于图2的一种非易失性计算机存储介质，存储有计算机可执行指令，所述计算机可执行指令设置为：Based on the same idea, the embodiment of this specification also provides a non-volatile computer storage medium corresponding to FIG. 2, which stores computer-executable instructions, and the computer-executable instructions are set to:

将所述梯度异步更新；updating the gradient asynchronously;

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置、设备、非易失性计算机存储介质实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus, equipment, and non-volatile computer storage medium embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiments.

本说明书实施例提供的装置、设备、非易失性计算机存储介质与方法是对应的，因此，装置、设备、非易失性计算机存储介质也具有与对应方法类似的有益技术效果，由于上面已经对方法的有益技术效果进行了详细说明，因此，这里不再赘述对应装置、设备、非易失性计算机存储介质的有益技术效果。The device, device, and non-volatile computer storage medium provided in the embodiments of this specification correspond to the method. Therefore, the device, device, and non-volatile computer storage medium also have beneficial technical effects similar to those of the corresponding method. Since the above The beneficial technical effects of the method are described in detail, therefore, the beneficial technical effects of the corresponding devices, equipment, and non-volatile computer storage media will not be repeated here.

在20世纪90年代，对于一个技术的改进可以很明显地区分是硬件上的改进(例如，对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而，随着技术的发展，当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此，不能说一个方法流程的改进就不能用硬件实体模块来实现。例如，可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable GateArray，FPGA))就是这样一种集成电路，其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上，而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且，如今，取代手工地制作集成电路芯片，这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现，它与程序开发撰写时所用的软件编译器相类似，而要编译之前的原始代码也得用特定的编程语言来撰写，此称之为硬件描述语言(Hardware Description Language，HDL)，而HDL也并非仅有一种，而是有许多种，如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware DescriptionLanguage)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(RubyHardware Description Language)等，目前最普遍使用的是VHDL(Very-High-SpeedIntegrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚，只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中，就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, the improvement of a technology can be clearly distinguished as an improvement in hardware (for example, improvements in circuit structures such as diodes, transistors, and switches) or improvements in software (improvement in method flow). However, with the development of technology, the improvement of many current method flows can be regarded as the direct improvement of the hardware circuit structure. Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (Programmable Logic Device, PLD) (such as a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)) is such an integrated circuit, and its logic function is determined by programming the device by a user. It is programmed by the designer to "integrate" a digital system on a PLD, instead of asking a chip manufacturer to design and make a dedicated integrated circuit chip. Moreover, nowadays, instead of making integrated circuit chips by hand, this kind of programming is mostly realized by "logic compiler (logic compiler)" software, which is similar to the software compiler used when writing programs. The original code of the computer must also be written in a specific programming language, which is called a hardware description language (Hardware Description Language, HDL), and there is not only one kind of HDL, but many kinds, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., currently the most commonly used is VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that only a little logical programming of the method flow in the above-mentioned hardware description languages and programming into an integrated circuit can easily obtain a hardware circuit for realizing the logic method flow.

控制器可以按任何适当的方式实现，例如，控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit，ASIC)、可编程逻辑控制器和嵌入微控制器的形式，控制器的例子包括但不限于以下微控制器：ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320，存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道，除了以纯计算机可读程序代码方式实现控制器以外，完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件，而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至，可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller may be implemented in any suitable way, for example the controller may take the form of a microprocessor or processor and a computer readable medium storing computer readable program code (such as software or firmware) executable by the (micro)processor , logic gates, switches, Application Specific Integrated Circuit (ASIC), programmable logic controllers, and embedded microcontrollers, examples of controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, the memory controller can also be implemented as part of the memory's control logic. Those skilled in the art also know that, in addition to realizing the controller in a purely computer-readable program code mode, it is entirely possible to make the controller use logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded The same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as structures within the hardware component. Or even, means for realizing various functions can be regarded as a structure within both a software module realizing a method and a hardware component.

上述实施例阐明的系统、装置、模块或单元，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的，计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules, or units described in the above embodiments can be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementing device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Combinations of any of these devices.

为了描述的方便，描述以上装置时以功能分为各种单元分别描述。当然，在实施本说明书时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above devices, functions are divided into various units and described separately. Of course, when implementing this specification, the functions of each unit can be implemented in one or more pieces of software and/or hardware.

本领域内的技术人员应明白，本说明书实施例可提供为方法、系统、或计算机程序产品。因此，本说明书实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本说明书实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of this specification may be provided as methods, systems, or computer program products. Accordingly, the embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本说明书是参照根据本说明书实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The specification is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the specification. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

在一个典型的配置中，计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read only memory (ROM) or flash RAM. Memory is an example of computer readable media.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes Other elements not expressly listed, or elements inherent in the process, method, commodity, or apparatus are also included. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

本说明书可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本说明书，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The present description may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于系统实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to part of the description of the method embodiment.

以上所述仅为本说明书实施例而已，并不用于限制本申请。对于本领域技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present specification, and are not intended to limit the present application. For those skilled in the art, various modifications and changes may occur in this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included within the scope of the claims of the present application.

Claims

1. a cluster-based word vector processing method, the cluster includes a plurality of work machines and servers, the method comprising:

Each of the working machines executes respectively:

Obtain words extracted from part of the corpus and their context words;

Obtain word vectors of the word and its context words;

Calculate the gradient according to the word and its context word, and the corresponding word vector;

asynchronously updating the gradient to the server;

The server updates word vectors of the word and its context words according to the gradient.

2. The method according to claim 1, before the described acquisition is extracted from part of the corpus and its context word, the method also includes:

Each of the working machines reads in a distributed manner to obtain part of the corpus;

The acquisition of words and context words extracted from part of the corpus specifically includes:

According to the corpus read by oneself, corresponding word pairs are established, and the word pairs include the current word and its upper and lower words.

3. The method according to claim 2, the word vectors of the described word and its context word are obtained, specifically comprising:

Extract the current word set and the context word set according to each of the word pairs established by oneself;

Obtain word vectors of words contained in the current word set and the context word set from the server.

4. the method for claim 2, described according to described word and context word thereof, and corresponding word vector, calculate gradient, specifically comprise:

Calculate the gradient corresponding to each word according to the specified loss function, negative sample words, each word pair created by yourself, and the word vector of the word and its context word.

5. The method according to claim 1, said calculating the gradient, specifically comprising:

One or more threads on the working machine calculate gradients in a manner of asynchronous calculation and update without locking.

6. The method according to claim 1, wherein the working machine asynchronously updates the gradient to the server, specifically comprising:

After the working computer calculates the gradient, it sends the gradient to the server, wherein the sending action does not need to wait for other working computers to send the gradient to the server.

7. The method according to claim 4, wherein the server updates the word vectors of the word and its context word according to the gradient, specifically comprising:

According to the following formula, the word and its context word, and the word vector of the negative sample word are iteratively updated:

in, w represents the current word, c represents the context word of w, c' represents the negative sample word, Represents the word vector of w, Represents the word vector of c, and Represents the tth update on the server, B _k represents the kth group of corpus on the working machine, Γ(w) represents the set of context words and negative sample words of w, α represents the learning rate, and σ is the Sigmoid function .

8. A cluster-based word vector processing device, the cluster includes a plurality of working machines and servers, the device is located in the cluster, including a first acquisition module, a second acquisition module, and a gradient calculation located in the working machine module, an asynchronous update module, a word vector update module located at the server;

Each working machine executes respectively through corresponding modules:

The first acquisition module acquires words extracted from part of the corpus and context words thereof;

The second obtaining module obtains the word vector of the word and its context word;

The gradient calculation module calculates the gradient according to the word and its context words, and the corresponding word vector;

The asynchronous update module asynchronously updates the gradient to the server;

The word vector updating module of the server updates the word vectors of the word and its context words according to the gradient.

9. The device according to claim 8, before the first acquisition module obtains the word extracted from the partial corpus and its context word, the partial corpus is read in a distributed manner;

The first acquisition module acquires words extracted from part of the corpus and context words thereof, specifically including:

The first acquisition module creates corresponding word pairs according to the corpus it has read, and the word pairs include the current word and its context words.

10. The device according to claim 9, the second acquisition module acquires the word vector of the word and its context word, specifically comprising:

The second acquisition module extracts the current word set and the context word set according to the word pairs established by the first acquisition module;

11. The device according to claim 9, the gradient calculation module calculates the gradient according to the word and its context words, and the corresponding word vector, specifically comprising:

The gradient calculation module calculates the gradient corresponding to each word according to the specified loss function, negative sample words, each word pair created by itself, and the word vector of the word and its context word.

12. The device according to claim 8, the gradient calculation module calculates the gradient, specifically comprising:

One or more threads of the gradient calculation module calculate gradients in a manner of asynchronous calculation and update without locking.

13. The device according to claim 8, wherein the asynchronous update module asynchronously updates the gradient to the server, specifically comprising:

The asynchronous update module sends the gradient to the server after the gradient calculation module calculates the gradient, wherein the execution of the sending action does not need to wait for the asynchronous update module of other working machines to send the gradient to the server. gradient.

14. The device according to claim 11, the word vector updating module updates the word vector of the word and its context word according to the gradient, specifically comprising:

The word vector update module performs iterative updates to the word and its context words, and the word vector of the negative example word according to the following formula:

15. A cluster-based word vector processing device, the device belonging to the cluster, comprising:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:

Obtain words extracted from part of the corpus and their context words;

Obtain word vectors of the word and its context words;

updating the gradient asynchronously;

The word vectors of the word and its context words are updated according to the asynchronously updated gradient.