Revealing Lexical Meanings: Contextual Vectors and Clustering

In Natural Language Processing (NLP), understanding the contextual subtleties that modulate a word’s meaning is a significant challenge. The traditional approach, representing words as multiple discrete senses, generally fails to capture the range of complex meaning variation exhibited by words in context (which is as varied as the contexts, as a rule), specifically because that variation is arbitrarily parsed into a discrete set sub-meanings (semantic domain models also fall prey to this issue, when they allocate some lexemes to multiple semantic domains—in essence the domains predefine the set of meanings any given lexeme may fall into).

Transformer models do a good job of modelling contextual meanings (both distant and immediate context), but such models have the distinct disadvantage of being mostly opaque. Global vector models are better in this regard, yet they only model global context (or all the contexts a target shows up in in the corpus).

To tackle this problem, I will introduce a technique that combines attention mechanisms with various clustering techniques, aiming to represent words in a monosemic manner with explicit representations of contextual modulation.

For some very sketchy and prone-to-change implementation details, see my draft notebook on contextual vectors.

1. Unraveling Context with Attention Mechanisms

The process begins by transforming the text data into global vectors, and then into contextual vectors using attention mechanisms. These mechanisms, a cornerstone of modern transformer models, allow us to capture both the inherent meaning of a word and the rich context surrounding it in the form of contextual embeddings.

What I have tried so far is the following:

Start with global word2vec or fastText vectors
For each unique lexeme in corpus
- Gather all sentences this lexeme occurs in
- For each gathered sentence
  - Calculate the cosign similarity between the target lexeme and every other lexeme in the sentence
  - Weight each similarity score by the distance from the target lexeme (further away = less similar; this is obviously a step that is amenable to multiple possible configurations)
- Store all of the resulting contextual vectors (or “attention” vectors) in a dictionary with the lexeme as the key

2. Identifying Contextual Patterns with Clustering

The next step involves clustering these contextual vectors. Using unsupervised machine learning methods such as K-means, DBSCAN, or hierarchical clustering, I group together similar vectors. These clusters form centroids that represent typical patterns of contextual variation, rather than separate, isolated word senses.

What I have tried so far:

For each lexeme
- Cluster all associated contextual vectors using some clustering technique
- Identify centroid vectors within each cluster
- Allow the centroid vectors to represent the ‘word senses’

3. Strengthening Clustering with Ensemble Techniques

To enhance the reliability of the clusters, I implement ensemble clustering techniques. By merging the results of multiple clustering algorithms, one could create a consensus on the cluster assignments. This approach capitalizes on the strengths of each algorithm, providing a more resilient understanding of contextual modulation.

In reality, I haven’t yet worked through any means of evaluating the resulting clusters. The trickiest stage in any clustering process (in my experience) is determining how many clusters you need. This is always a lossy process, so there seem to be drawbacks no matter how you slice up the semantic space.

4. Advanced Techniques: Neighbour-Aware and Consensus Clustering

I further refine the clustering process with two more techniques: neighbour-aware and consensus clustering. Neighbour-aware clustering considers the nearest neighbours of each vector, providing a better representation of local contextual variation. In consensus clustering, I amalgamate the outcomes from several neighbour-aware clustering methods to determine the most accurate cluster assignments.

Conclusion

Through the synergistic combination of attention mechanisms and multiple clustering techniques, I am developing a method to represent lexical meanings in a monosemic manner that transparently accounts for the role of contextual modulation in arriving at ‘typical’ word senses.

I’ve always been prone to challenge the common assumption of multiple word senses being inherent features of lexemes (rather than emergent features of larger contexts). This work in progress offers one attempt at representing and analyzing text data to this end. This opens up new possibilities for text understanding and semantic analysis in NLP, paving the way for more accurate and contextually-aware models of lexical semantics that recognize decontextualized lexemes as meaning potentials, and contextualized lexemes as specifications of those potentials.

Contextual Vectors for Lexical Meaning

Table of Contents