+91 9404 340 614    gyaanibuddy@gmail.com

Like
0 Likes

InferLite: Simple Universal Sentence Representations from Natural Language Inference Data

Last updated on April 6, 2021, 8:24 p.m. by tushar

Summary of research paper and important sentences

In the paper titled Simple Universal Sentence Representations from Natural Language Inference Data, the authors Jamie Ryan Kiros & William Chan from Google Brain Toronto have introduced a novel approach to represent sentences as sentence embeddings which can be used for many NLP tasks. The proposed method does not take into account the context as well as the position of the words in the sentences. There are 4 major parts of the authors model- the encoder, controller, fusion and reduction. First the many different types of pre-trained word embeddings are concatenated and then these are passed to the encoder. Similarly, all the words embeddings of a sentence are passed to the controller. The underlying architecture of both the controller and the encoder is the same, i.e. the convolutional layer. The fusion layer then combines the outputs of the controller and the encoder in a weighted fashion. The final output is given by the reduction layer which basically applies a max-pooling function to the tokens. This model can be extended to use the semantic hashing which learns generic binary vectors. It is basically a layer after the reduction layer given by the following -  

h(s) = sigmoid( LN(W*X+b)/t) where LN is the layer normalization and t is the exponentially decaying temperature hyperparameter.The author has done various experiments with training and the results show that using local context of the words as well as more word embeddings improves performance. The whole corpora is trained and evaluated on the NLI dataset. In the experiments performed by the author, it was found that using the glove, news and query embeddings gave significant results comparable to the InferSent but including the other embedding types although gave more accuracy on NLI, but it did not improve performance of other downstream tasks.

 

Important Sentence -

  • we propose a lightweight version of InferSent (Conneau et al., 2017), called InferLite, hat does not use any recurrent layers and operates on a collection of pre-trained word embeddings.

  • Our method uses a controller to dynamically weight embeddings for each word followed by max pooling over components to obtain the final sentence representation.

  • With a lightweight encoder, we can encode millions of sentences efficiently without requiring extensive computational resources.

  • Moreover, we include an ablation study in the appendix that shows even innocent or seemingly irrelevant model decisions can have a drastic effect on performance.

  • Our method operates on a collection of pre-trained word representations and is then trained on the concatenation of SNLI (Bowman et al., 2015) and MultiNLI

  • Our method takes as input a collection of embeddings for each word and learns a gated controller to decide how to weight each representation. After encoding each word in a sentence, the sentence embedding is obtained by max pooling the transformed word representations.

  • We consider encoders that use convolutional filters of length 1 (no context) or length 3 (local context), with a stack of M = 3 convolutional layers.

  • Next we observe that adding local context helps significantly on MR, CR, SST2 and TREC tasks.Furthermore, fusing embeddings from query and news models matches or improves performance over a glove-only model on 12 out of 15 tasks. Our (glove+news+query,3) model is best on 5 tasks and is a generally strong performer across all evaluations

  • While adding these embeddings improved performance on NLI, they did not lead to any performance gains on downstream tasks.

...

by tushar
KJ Somaiya College of Engineering Mumbai

Software Engineer | SWE Intern'21 @ConnectWise | Ex- Smollan | KJSCE CSE'22
blog comments powered by Disqus