Skip to content

Bug Report: Emoji Variation Selectors Not Removed from Slugs #53

@comanche

Description

@comanche

Bug Report: Emoji Variation Selectors Not Removed from Slugs

Description

The slug() function does not fully remove emojis that contain Unicode variation selectors (U+FE00 to U+FE0F), leaving invisible characters in the generated slug. This creates broken anchor links.

Expected Behavior

Based on the test fixtures, emojis should be completely removed from slugs:

  • "😄 unicode emoji""-unicode-emoji"

Actual Behavior

Emojis composed with variation selectors leave the invisible variation selector character in the slug:

  • "🗃️ File Cabinet""️-file-cabinet" ✗ (contains invisible U+FE0F)

Reproduction

const { slug } = require('github-slugger');

// Simple emoji - works correctly
console.log(slug('✅ Checkmark'));
// Output: "-checkmark" ✓

// Emoji with variation selector - leaves invisible character
console.log(slug('🗃️ File Cabinet'));
// Output: "️-file-cabinet" ✗ (starts with invisible U+FE0F)

// Inspect the bytes to see the problem
const result = slug('🗃️ File Cabinet');
console.log('Bytes:', Buffer.from(result).toString('hex'));
// Output: "efb88f2d66696c652d636162696e6574"
//          ^^^^^^ = U+FE0F (variation selector - should not be here!)

Root Cause

Some emojis are composed of multiple Unicode characters:

  1. Base emoji character (e.g., U+1F5C3 = 🗃)
  2. Variation Selector-16 (U+FE0F) - makes the emoji display in color

Examples:

  • 🗃️ = U+1F5C3 + U+FE0F (2 characters)
  • 1️⃣ = U+0031 + U+FE0F + U+20E3 (3 characters)
  • = U+2705 (1 character) ✓

The current implementation removes the visible emoji character but leaves the invisible variation selector behind.

Impact

This breaks markdown anchor links in real-world usage:

## 🗃️ File Management

Link to section: [See File Management](#-file-management)

The link won't work because the slug contains an invisible character that doesn't match.

Affected Emojis

Common emojis with variation selectors include:

  • 🗃️ (File Cabinet)
  • ☑️ (Check Box)
  • ✔️ (Check Mark)
  • ⭐️ (Star)
  • ❤️ (Red Heart)
  • 1️⃣ 2️⃣ 3️⃣ (Keycap Numbers)
  • Many others with U+FE0F modifier

Proposed Fix

Strip variation selectors (U+FE00 through U+FE0F) after processing emojis:

function slug(value) {
  // ... existing slug generation code ...

  // Remove variation selectors that may remain after emoji removal
  result = result.replace(/[\uFE00-\uFE0F]/g, '');

  return result;
}

Workaround

Users can post-process the slug output:

const { slug } = require('github-slugger');
const result = slug('🗃️ File Cabinet').replace(/[\uFE00-\uFE0F]/g, '');
// Output: "-file-cabinet" ✓

Environment

  • github-slugger version: 2.0.0 (latest)
  • Node.js version: v20+
  • Platform: All platforms

Additional Context

Similar issues have been addressed in related libraries:

This is becoming more common as modern emoji keyboards automatically add variation selectors for better rendering.


Would you like me to submit a pull request with the fix?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions