What are Skip-grams?
A skip-gram is a generalization of an n-gram where the components (words or characters) do not need to be consecutive in the original text. Instead, they can be separated by gaps (skips).
The term is widely used in Natural Language Processing (NLP), particularly in models like Word2Vec, where skip-grams help capture the context of words even when they are not immediately adjacent.
k-skip-n-grams Explained
The formal definition used in this tool is k-skip-n-grams:
- n: The number of items (words/chars) in the sequence.
- k: The maximum number of items that can be skipped between any two items in the sequence.
For example, in the sentence "The quick brown fox":
- Regular Bigrams (n=2, k=0): "The quick", "quick brown", "brown fox"
- 1-skip-2-grams (n=2, k=1): Includes the above PLUS "The brown" (skipped 'quick'), "quick fox" (skipped 'brown').
How to use this tool
- Enter Text: Paste your content or drag & drop a file.
- Set N (Size): Choose the length of the gram (e.g., 2 for pairs).
- Set K (Skips): Choose the maximum skip distance allowed.
- View Results: The tool generates all valid subsequences meeting your criteria.