deepmatcher.word_contextualizers

RNN

class deepmatcher.word_contextualizers.RNN(*args, **kwargs)[source]

Multi layered RNN based Word Contextualizer.

Supports dropout and residual / highway connections. Takes the same parameters as the RNN module.

SelfAttention

class deepmatcher.word_contextualizers.SelfAttention(heads=1, hidden_size=None, input_dropout=0, alignment_network='decomposable', scale=False, score_dropout=0, value_transform_network=None, value_merge='concat', transform_dropout=0, output_transform_network=None, output_dropout=0, bypass_network='highway', input_size=None)[source]

Self Attention based Word Contextualizer.

Supports vanilla self attention and multi-head self attention.

Parameters:
  • heads (int) – Number of attention heads to use. Defaults to 1.
  • hidden_size (int) – The default hidden size of the alignment_network and transform networks, if they are not disabled.
  • input_dropout (float) – If non-zero, applies dropout to the input to this module. Dropout probability must be between 0 and 1.
  • alignment_network (string or deepmatcher.modules.AlignmentNetwork or callable) – The neural network takes the input sequence, aligns the words in the sequence with other words in the sequence, and returns the corresponding alignment score matrix. Argument must specify a Align operation.
  • scale (bool) – Whether to scale the alignment scores by the square root of the hidden_size parameter. Based on scaled dot-product attention
  • score_dropout (float) – If non-zero, applies dropout to the alignment score matrix. Dropout probability must be between 0 and 1.
  • value_transform_network (string or Transform or callable) – For each word embedding in the input sequence, SelfAttention takes a weighted average of the aligning values, i.e., the aligning word embeddings based on the alignment scores. This parameter specifies the neural network to transform the values (word embeddings) before taking the weighted average. Argument must be None or specify a Transform operation. If the argument is a string, the hidden size of the transform operation is computed as hidden_size // heads. If argument is None, and heads is 1, then the values are not transformed. If argument is None and heads is > 1, then a 1 layer highway network without any non-linearity is used. The hidden size for this is computed as mentioned above.
  • value_merge (string or Merge or callable) – For each word embedding in the input sequence, each SelfAttention head produces one corresponding vector as output. This parameter specifies how to merge the outputs of all attention heads for each word embedding. Concatenates the outputs of all heads by default. Argument must specify a Merge operation.
  • transform_dropout (float) – If non-zero, applies dropout to the output of the value_transform_network, if applicable. Dropout probability must be between 0 and 1.
  • output_transform_network (string or Transform or callable) – For each word embedding in the input sequence, SelfAttention produces one corresponding vector as output. This neural network specifies how to transform each of these output vectors to obtain a hidden representation of size hidden_size. Argument must be None or specify a Transform operation. If argument is None, and heads is 1, then the output vectors are not transformed. If argument is None and heads is > 1, then a 1 layer highway network without any non-linearity is used.
  • output_dropout (float) – If non-zero, applies dropout to the output of the output_transform_network, if applicable. Dropout probability must be between 0 and 1.
  • bypass_network (string or Bypass or callable) – The bypass network (e.g. residual or highway network) to use. The input word embedding sequence to this module is considered as the raw input to the bypass network and the final output vector sequence (output of value_merge or output_transform_network if applicable) is considered as the transformed input. Argument must specify a Bypass operation. If None, does not use a bypass network.
  • input_size (int) – The number of features in the input to the module. This parameter will be automatically specified by LazyModule.