How LSP Autocomplete Sorting Works: A Simple Fix for Better Suggestions
I forked a Python type checker just to fix autocomplete sorting. That’s how frustrated I was.
When I typed self. in VSCode, I expected self.__init__ or self.name to appear at the top. Instead, I got:
self.__class__ <- Rarely usedself.__delattr__ <- Never neededself.__dict__ <- Sometimes usefulself.__dir__ <- What is this?...self.init <- Finally! (after scrolling)The Problem
The Language Server Protocol (LSP) defines how editors communicate with language servers for features like autocomplete. But the protocol leaves sorting strategy almost entirely to the implementation.
Here’s what the LSP specification says about completion items:
interface CompletionItem { label: string; // The label shown to the user kind?: CompletionItemKind; // Method, function, variable, etc. detail?: string; // Additional details documentation?: string; // Full docs sortText?: string; // Sort order override filterText?: string; // Filter override insertText?: string; // What gets inserted // ... more optional fields}Notice sortText? It’s optional. And many language servers don’t set it meaningfully.
Why Sorting Feels Random
I dug into how LSP handles completion sorting and found three main issues:
1. Alphabetical by default
Most language servers fall back to alphabetical sorting when no sortText is provided. That’s why __class__ appears before init.
2. No usage tracking
LSP doesn’t track which completions you actually select. The protocol has no mechanism for learning from user behavior.
3. Static heuristics
Some servers use static heuristics based on item type (methods before variables, for example), but these don’t account for your specific usage patterns.
┌─────────────────────────────────────────────┐│ LSP Sorting Options │├─────────────────────────────────────────────┤│ 1. sortText field (optional) ││ 2. Alphabetical (default fallback) ││ 3. Kind-based ranking (some servers) ││ 4. Usage frequency (rarely implemented)│└─────────────────────────────────────────────┘A Reddit user captured this frustration:
“It feels quite absurd that basic intelligence like this is lacking from the majority of programs” - 40 points
The Solution: Hash Table Lookup
I created a simple solution: a hash table of commonly used prefixes.
# Common Python prefixes ranked by actual usageCOMMON_PREFIXES = { 'self.': { '__init__': 100, 'name': 95, 'value': 90, 'data': 85, 'id': 80, # ... more common attributes }, 'os.': { 'path': 100, 'environ': 95, 'getcwd': 90, 'listdir': 85, 'remove': 80, # ... more common functions }, 'json.': { 'loads': 100, 'dumps': 95, 'load': 90, 'dump': 85, }, 'np.': { # NumPy shortcuts 'array': 100, 'zeros': 95, 'ones': 90, 'mean': 85, },}
def rank_completion(prefix: str, completion: str) -> int: """Return ranking score for a completion item.""" if prefix not in COMMON_PREFIXES: return 0 return COMMON_PREFIXES[prefix].get(completion, 0)
def sort_completions(prefix: str, items: list[str]) -> list[str]: """Sort completions by usage frequency.""" # Sort by rank (descending), then alphabetically return sorted(items, key=lambda x: (-rank_completion(prefix, x), x))This approach has clear advantages:
┌─────────────────────────────────────────────┐│ Hash Table vs AI-based Sorting │├─────────────────────────────────────────────┤│ Hash Table: ││ - O(1) lookup time ││ - Predictable behavior ││ - Easy to debug and modify ││ - No CPU overhead ││ - Handles known patterns well │├─────────────────────────────────────────────┤│ AI-based: ││ - Slower inference ││ - CPU-intensive ││ - Black box behavior ││ - Better for unknown patterns ││ - Higher maintenance cost │└─────────────────────────────────────────────┘How LSP Completion Actually Works
Let me show you how completion items flow through the LSP protocol:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐│ VSCode │────▶│ LSP Client │────▶│ Language ││ (Editor) │ │ (Bridge) │ │ Server │└──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ textDocument/ │ │ │ completion │ │ │ ───────────────────────────────────────▶ │ │ │ │ │ CompletionItem[] │ │ ◀─────────────────────────────────────── │ │ │ │ Editor applies │ │ │ client-side │ │ │ sorting/filtering │ │ │ │ │The client (VSCode) requests completions from the language server. The server returns a list of CompletionItem objects. Then the client applies its own sorting.
Implementing Better Sorting in a Language Server
Here’s how I modified a Python language server to use hash table ranking:
from dataclasses import dataclassfrom typing import List
@dataclassclass CompletionItem: label: str kind: int sort_text: str = "" detail: str = ""
# Pre-computed prefix rankingsPREFIX_RANKS = { ('self', '__init__'): '001', ('self', 'name'): '002', ('os', 'path'): '001', ('json', 'loads'): '001',}
def get_sort_text(prefix: str, label: str) -> str: """Generate sortText based on prefix and label.""" key = (prefix, label) if key in PREFIX_RANKS: return PREFIX_RANKS[key] # Fall back to kind + label return f'999{label}'
def build_completions(prefix: str, items: List[str], kinds: List[int]) -> List[CompletionItem]: """Build completion items with proper sortText.""" completions = [] for label, kind in zip(items, kinds): completions.append(CompletionItem( label=label, kind=kind, sort_text=get_sort_text(prefix, label) )) return completionsThe key insight: the sortText field determines display order. Lower values appear first.
VSCode Client-Side Configuration
You can also improve sorting on the VSCode side:
{ "editor.suggestSelection": "recentlyUsedByPrefix", "editor.snippetSuggestions": "top", "editor IntelliWidget.suggest.showMethods": true, "editor.IntelliWidget.suggest.showFunctions": true}The recentlyUsedByPrefix option tells VSCode to remember which completions you select for each prefix context.
Why Hasn’t This Been Fixed?
I wondered why such an obvious problem persists. The answer is architectural:
1. LSP protocol limitations
The protocol doesn’t include usage frequency or learning capabilities. Each client (VSCode, Neovim, Emacs) implements its own ranking.
2. Server diversity
There are dozens of Python language servers (Pylance, Jedi, pyright, pylsp). Fixing sorting in one doesn’t fix it everywhere.
3. Resource constraints
Pylance already uses 4GB+ of RAM. Adding ML-based ranking would make it worse.
4. Specification inertia
LSP is an open standard. Adding new features requires coordination between Microsoft, language server authors, and editor developers.
The Ideal Solution
A combined approach would work best:
┌────────────────────────────────────────────────────────┐│ Proposed LSP Sorting Stack │├────────────────────────────────────────────────────────┤│ Layer 1: Hash table (fast, predictable) ││ - Common prefixes (self., os., json.) ││ - O(1) lookup ││ ││ Layer 2: Context-aware heuristics ││ - Type matching ││ - Scope relevance ││ - Recent edits in file ││ ││ Layer 3: Usage learning (client-side) ││ - Track selection frequency ││ - Personalize over time ││ - No server overhead │└────────────────────────────────────────────────────────┘Related Knowledge
Language Server Protocol (LSP): A JSON-RPC based protocol that separates language features from editors. Editors become thin clients, while language servers provide intelligence like autocomplete, go-to-definition, and diagnostics.
CompletionItemKind: An enum in LSP that categorizes completions (text, method, function, constructor, field, variable, class, interface, module, property, etc.). Useful for filtering and visual icons.
sortText vs filterText: sortText controls display order. filterText controls what text is matched against user input. They serve different purposes but both affect autocomplete UX.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments