Text Utilities
Comprehensive utilities for text manipulation with Indonesian language context optimization.
Overview
The Text module provides a suite of tools for robust string manipulation, designed specifically to handle Indonesian language nuances. It addresses common challenges such as proper title casing with particles, expanding local abbreviations, generating URL-safe slugs, and comparing strings with various sensitivity levels.
Features
- Smart Capitalization: Title casing that respects Indonesian particles (di, ke, dari, yang) and keeps acronyms (PT, CV, dll) uppercase.
- Abbreviation Handling: Expand and contract common Indonesian abbreviations (e.g., “Jl.” to “Jalan”, “Bpk.” to “Bapak”).
- Slug Generation: Create SEO-friendly slugs that handle Indonesian conjunctions (e.g., ”&” becomes “dan”, ”/” becomes “atau”).
- Text Extraction: Truncate text intelligently at word boundaries and extract words while preserving Indonesian hyphenated words (e.g., “anak-anak”).
- Robust Comparison: Compare strings with options to ignore case, whitespace, and accents for accurate searching and filtering.
- Sanitization: Clean and normalize text input to prevent injection rendering issues and ensure consistency.
- Text Masking: Privacy-compliant data display with
all,middle, andemailmasking patterns. - Case Converters: Lightweight
toCamelCase,toPascalCase, andtoSnakeCaseto replace Lodash dependency. - Syllable Counting: Algorithm-based syllable counter for Indonesian text with dipthong awareness.
Installation
npm
npm install @indodev/toolkitQuick Start
import { toTitleCase, slugify, truncate, compareStrings, expandAbbreviation, maskText, toCamelCase } from '@indodev/toolkit/text';
// Title Case
console.log(toTitleCase('laskar pelangi: kisah nyata'));
// 'Laskar Pelangi: Kisah Nyata'
// Slug
console.log(slugify('Baju Pria & Wanita'));
// 'baju-pria-dan-wanita'
// Truncate
console.log(truncate('Ini adalah text yang sangat panjang', 20));
// 'Ini adalah text...'
// Comparison
console.log(compareStrings('surabaya', 'SURABAYA'));
// true
// Abbreviation
console.log(expandAbbreviation('Jl. Sudirman No. 1'));
// 'Jalan Sudirman Nomor 1'
// Mask Text
console.log(maskText('08123456789', { pattern: 'middle', visibleStart: 4, visibleEnd: 3 }));
// '0812***6789'
// Case Converters
console.log(toCamelCase('hello-world'));
// 'helloWorld'
// Profanity Filter
console.log(profanityFilter('jangan bicara kasar anjing'));
// 'jangan bicara kasar ******'
// Stopwords Removal
console.log(removeStopwords('saya sedang makan nasi'));
// 'makan nasi'
// Normalization
console.log(toFormal('gw pen makan nih'));
// 'saya ingin makan ini'API Reference
toTitleCase()
Converts text to title case, respecting Indonesian grammar rules for particles (di, ke, dari, yang, etc.) and keeping acronyms uppercase.
Type Signature:
function toTitleCase(text: string, options?: TitleCaseOptions): string;Parameters:
| Name | Type | Description |
|---|---|---|
text | string | The text to convert |
options | TitleCaseOptions | Optional configuration |
Returns:
string - Title-cased text
Indonesian Rules:
- Particles like
di,ke,dari,yang,pada,untukremain lowercase unless they are the first word. - Acronyms like
PT,CV,SD,SMP,SMAare forced to uppercase.
Examples:
// Basic usage
toTitleCase('buku panduan belajar di rumah');
// 'Buku Panduan Belajar di Rumah' (particle 'di' is lowercase)
// With acronyms
toTitleCase('laporan tahunan pt maju jaya');
// 'Laporan Tahunan PT Maju Jaya' ('PT' is recognized acronym)
// Sentences
toTitleCase('ibu pergi ke pasar');
// 'Ibu Pergi ke Pasar'expandAbbreviation()
Expands common Indonesian abbreviations to their full forms. Supports addresses, titles, honorifics, and organizations.
Type Signature:
function expandAbbreviation(text: string, options?: ExpandOptions): string;Parameters:
| Name | Type | Description |
|---|---|---|
text | string | The text containing abbreviations |
options | ExpandOptions | Optional configuration for mode and custom maps |
Returns:
string - Text with abbreviations expanded
Supported Categories:
- Address: Jl., Gg., No., Kec., Kab., etc.
- Title: Prof., Dr., Ir., S.Pd., etc.
- Honorific: Bpk., Ibu, Sdr., Yth., etc.
- Organization: PT., CV., UD., Tbk., etc.
Examples:
// Address expansion
expandAbbreviation('Jl. Merdeka No. 45');
// 'Jalan Merdeka Nomor 45'
// Title expansion
expandAbbreviation('Prof. Dr. Habibie');
// 'Profesor Doktor Habibie'
// Specific mode
expandAbbreviation('Dr. di Jl. Sudirman', { mode: 'address' });
// 'Dr. di Jalan Sudirman' (only expands address)contractAbbreviation()
Contracts full words into their standard abbreviations (reverse of expand).
Type Signature:
function contractAbbreviation(text: string, options?: ExpandOptions): string;Parameters:
| Name | Type | Description |
|---|---|---|
text | string | The text containing full words |
options | ExpandOptions | Optional configuration |
Returns:
string - Text with full words contracted to abbreviations
Examples:
contractAbbreviation('Jalan Jenderal Sudirman');
// 'Jl. Jenderal Sudirman'
contractAbbreviation('Profesor Doktor');
// 'Prof. Dr.'toFormal()
Normalizes informal Indonesian text (slang/bahasa gaul) into formal Indonesian (Bahasa Baku).
Type Signature:
function toFormal(text: string): string;Parameters:
| Name | Type | Description |
|---|---|---|
text | string | The informal text to normalize |
Returns:
string - Formalized text.
Examples:
toFormal('gw pen makan nih'); // 'saya ingin makan ini'
toFormal('aq msh di jln'); // 'saya masih di jalan'
toFormal('gpp bsk aja'); // 'tidak apa-apa besok saja'isAlay()
Detects if a string contains “alay” patterns (common informal Indonesian text using numbers/special characters for letters).
Type Signature:
function isAlay(text: string): boolean;Parameters:
| Name | Type | Description |
|---|---|---|
text | string | The text to check |
Returns:
boolean - true if alay patterns are detected.
Examples:
isAlay('4p4 k4b4r?'); // ✅ true
isAlay('Apa kabar?'); // ❌ falseslugify()
Creates URL-friendly slugs from text, handling Indonesian conjunctions and special characters.
Type Signature:
function slugify(text: string, options?: SlugifyOptions): string;Parameters:
| Name | Type | Description |
|---|---|---|
text | string | The text to slugify |
options | SlugifyOptions | Optional configuration |
Returns:
string - URL-safe slug
Conjunction Handling:
&->dan/->atau+->plus(default behavior)
Examples:
// Basic usage
slugify('Cara Membuat Website');
// 'cara-membuat-website'
// Handling conjunctions
slugify('Pria & Wanita');
// 'pria-dan-wanita' (ampersand converted to 'dan')
slugify('Hitam/Putih');
// 'hitam-atau-putih' (slash converted to 'atau')
// Custom separator
slugify('Halo Dunia', { separator: '_' });
// 'halo_dunia'truncate()
Shortens text to a maximum length without cutting words in half (smart truncation).
Type Signature:
function truncate(text: string, maxLength: number, options?: TruncateOptions): string;Parameters:
| Name | Type | Description |
|---|---|---|
text | string | Text to truncate |
maxLength | number | Maximum allowed length |
options | TruncateOptions | Configuration for ellipsis and boundaries |
Returns:
string - Truncated text
Examples:
const text = 'Ini adalah kalimat yang cukup panjang';
// Smart cut (word boundary)
truncate(text, 15);
// 'Ini adalah...'
// Custom ellipsis
truncate(text, 15, { ellipsis: '…' });
// 'Ini adalah…'
// Force cut (ignore word boundary)
truncate(text, 10, { wordBoundary: false });
// 'Ini adalah...' (might cut mid-word depending on length)extractWords()
Splits text into words, handling Indonesian hyphenated words correctly.
Type Signature:
function extractWords(text: string, options?: ExtractOptions): string[];Parameters:
| Name | Type | Description |
|---|---|---|
text | string | Text to parse |
options | ExtractOptions | Filters for min length, lowercase, etc. |
Returns:
string[] - Array of extracted words
Examples:
extractWords('Anak-anak bermain bola!');
// ['Anak-anak', 'bermain', 'bola'] (keeps 'Anak-anak' together)
extractWords('Halo, apa kabar?', { lowercase: true });
// ['halo', 'apa', 'kabar']compareStrings()
Compares two strings with advanced options to ignore case, whitespace, and accents.
Type Signature:
function compareStrings(str1: string, str2: string, options?: CompareOptions): boolean;Parameters:
| Name | Type | Description |
|---|---|---|
str1 | string | First string |
str2 | string | Second string |
options | CompareOptions | Comparison flags |
Returns:
boolean - true if strings match according to options
Options:
interface CompareOptions {
caseSensitive?: boolean; // default: false
ignoreWhitespace?: boolean; // default: false
ignoreAccents?: boolean; // default: false
}Examples:
// Case insensitive (default)
compareStrings('Jakarta', 'jakarta');
// true
// Ignore whitespace
compareStrings(' Surabaya ', 'Surabaya', { ignoreWhitespace: true });
// true
// Ignore accents
compareStrings('Café', 'Cafe', { ignoreAccents: true });
// truesimilarity()
Calculates Levenshtein similarity score between two strings (0.0 to 1.0).
Type Signature:
function similarity(str1: string, str2: string): number;Parameters:
| Name | Type | Description |
|---|---|---|
str1 | string | First string |
str2 | string | Second string |
Returns:
number - Similarity score (0.0 = different, 1.0 = identical)
Examples:
similarity('Jakarta', 'Jakarta'); // 1.0
similarity('Jakarta', 'Jakart'); // 0.85
similarity('Jakarta', 'Bandung'); // 0.14sanitize()
Removes unwanted characters and normalizes text.
Type Signature:
function sanitize(text: string, options?: SanitizeOptions): string;Parameters:
| Name | Type | Description |
|---|---|---|
text | string | Text to sanitize |
options | SanitizeOptions | Configuration |
Returns:
string - Sanitized string
Examples:
sanitize(' Halo Dunia '); // 'Halo Dunia'
sanitize('<b>Bold</b>', { stripHtml: true }); // 'Bold'removeAccents()
Removes diacritics from characters (e.g., ‘é’ -> ‘e’).
Type Signature:
function removeAccents(text: string): string;Examples:
removeAccents('Café'); // 'Cafe'
removeAccents('Àpropos'); // 'Apropos'maskText()
Masks sensitive text based on predefined patterns for privacy-compliant data display.
Type Signature:
function maskText(text: string, options?: MaskOptions): string;Parameters:
| Name | Type | Description |
|---|---|---|
text | string | The text to mask |
options | MaskOptions | Masking configuration |
Returns:
string - Masked text
MaskOptions:
interface MaskOptions {
pattern?: 'all' | 'middle' | 'email';
maskChar?: string; // default: '*'
visibleStart?: number; // default: 2
visibleEnd?: number; // default: 2
}Examples:
// Mask all characters (preserves spaces)
maskText('Budi Santoso', { pattern: 'all' });
// '**** *******'
// Mask middle portion
maskText('08123456789', { pattern: 'middle', visibleStart: 4, visibleEnd: 3 });
// '0812****789'
// Mask email (keeps first 2 chars of local part + domain)
maskText('user@example.com', { pattern: 'email' });
// 'us**@example.com'
// Custom mask character
maskText('123456789', { pattern: 'all', maskChar: '#' });
// '#########'toCamelCase()
Converts text to camelCase format. Treats spaces, hyphens, and underscores as word boundaries.
Type Signature:
function toCamelCase(text: string): string;Examples:
toCamelCase('hello-world'); // 'helloWorld'
toCamelCase('hello_world'); // 'helloWorld'
toCamelCase('Hello World'); // 'helloWorld'
toCamelCase('helloWorld'); // 'helloWorld'toPascalCase()
Converts text to PascalCase format. Treats spaces, hyphens, and underscores as word boundaries.
Type Signature:
function toPascalCase(text: string): string;Examples:
toPascalCase('hello_world'); // 'HelloWorld'
toPascalCase('hello-world'); // 'HelloWorld'
toPascalCase('hello world'); // 'HelloWorld'toSnakeCase()
Converts text to snake_case format. Handles camelCase, spaces, hyphens, and underscores.
Type Signature:
function toSnakeCase(text: string): string;Examples:
toSnakeCase('helloWorld'); // 'hello_world'
toSnakeCase('Hello-World'); // 'hello_world'
toSnakeCase('Hello World'); // 'hello_world'countSyllables()
Counts the number of syllables in an Indonesian word using algorithm-based vowel counting with dipthong awareness.
Type Signature:
function countSyllables(text: string): number;Returns:
number - Number of syllables
Syllable Rules:
- Each vowel group (a, i, u, e, o) counts as one syllable
- Indonesian dipthongs (ai, au, oi) count as a single vowel sound
- Minimum 1 syllable for any word with letters
Examples:
// Indonesian words
countSyllables('buku'); // 2 (bu-ku)
countSyllables('matahari'); // 4 (ma-ta-ha-ri)
countSyllables('pulau'); // 2 (pu-lau, dipthong 'au')
// English words
countSyllables('hello'); // 2 (hel-lo)
countSyllables('beautiful'); // 3 (beau-ti-ful)
// Edge cases
countSyllables(''); // 0
countSyllables('a'); // 1
countSyllables('rhythm'); // 1 (no vowels, minimum 1)profanityFilter()
Filters out profanity using a comprehensive list of Indonesian and English curse words.
Type Signature:
function profanityFilter(text: string, options?: FilterOptions): string;Parameters:
| Name | Type | Description |
|---|---|---|
text | string | The text to filter |
options | FilterOptions | Optional filtering configuration |
Returns:
string - Filtered text with profanity masked.
Examples:
profanityFilter('jangan bicara kasar anjing');
// 'jangan bicara kasar ******'
profanityFilter('kasar banget asu', { replacement: '---' });
// 'kasar banget ---'removeStopwords()
Removes common Indonesian and English stopwords (words with little semantic value like ‘yang’, ‘di’, ‘the’, ‘is’).
Type Signature:
function removeStopwords(text: string, language?: 'id' | 'en' | 'both'): string;Parameters:
| Name | Type | Default | Description |
| ---------- | -------- | ------- | ------------------- | -------- | ------------------------- |
| text | string | - | The text to process |
| language | 'id' | 'en' | 'both' | 'both' | The language of stopwords |
Returns:
string - Text with stopwords removed.
Examples:
removeStopwords('saya sedang makan nasi');
// 'makan nasi'
removeStopwords('this is a book', 'en');
// 'book'Type Reference
TitleCaseOptions
Configuration input for toTitleCase.
export interface TitleCaseOptions {
/**
* If true, keeps original casing for words not in the exception list.
* @default false
*/
preserveOriginal?: boolean;
}SlugifyOptions
Configuration input for slugify.
export interface SlugifyOptions {
/**
* Separator character (e.g. '-', '_')
* @default '-'
*/
separator?: string;
/**
* Convert 'and' -> 'dan', '/' -> 'atau'
* @default true
*/
convertConjunctions?: boolean;
}TruncateOptions
Configuration input for truncate.
export interface TruncateOptions {
/**
* String to append when text is truncated.
* @default '...'
*/
ellipsis?: string;
/**
* Respect word boundaries (avoid cutting mid-word).
* @default true
*/
wordBoundary?: boolean;
}ExtractOptions
Configuration input for extractWords.
export interface ExtractOptions {
/**
* Convert words to lowercase.
* @default false
*/
lowercase?: boolean;
/**
* remove punctuation marks.
* @default true
*/
removePunctuation?: boolean;
/**
* Minimum length of words to extract.
* @default 1
*/
minLength?: number;
}CompareOptions
Configuration input for compareStrings.
export interface CompareOptions {
/**
* Perform case-sensitive comparison.
* @default false
*/
caseSensitive?: boolean;
/**
* Ignore leading/trailing whitespace.
* @default false
*/
ignoreWhitespace?: boolean;
/**
* Ignore diacritics/accents.
* @default false
*/
ignoreAccents?: boolean;
}Common Use Cases
Search Filtering
import { compareStrings } from '@indodev/toolkit/text';
function searchItems(items: string[], query: string) {
return items.filter((item) =>
compareStrings(item, query, {
caseSensitive: false,
ignoreWhitespace: true,
})
);
}URL Generation for Blog Posts
import { slugify } from '@indodev/toolkit/text';
function createBlogPostUrl(title: string) {
const customSlug = slugify(title, {
separator: '-',
lowercase: true, // Ensure everything is lowercase for URLs
});
return `/blog/${customSlug}`;
}Address Formatting
import { toTitleCase, expandAbbreviation } from '@indodev/toolkit/text';
function formatAddress(rawAddress: string) {
// 1. Expand abbreviations (Jl., Gg., No.)
const expanded = expandAbbreviation(rawAddress, { mode: 'address' });
// 2. Fix casing (Jalan Mawar Melati)
return toTitleCase(expanded);
}
console.log(formatAddress('jl. mawar melati no. 5'));
// 'Jalan Mawar Melati Nomor 5'Best Practices
Input Sanitization
Always sanitize user input before storing or processing it, especially when dealing with names or addresses that might contain irregular whitespace or HTML tags.
const cleanInput = sanitize(userInput, { stripHtml: true });Slug Uniqueness
The slugify function generates a URL-safe string, but it does not guarantee uniqueness in your database. Always check for existing slugs and append a suffix if necessary.
const baseSlug = slugify(title);
let finalSlug = baseSlug;
let counter = 1;
while (await db.slugExists(finalSlug)) {
finalSlug = `${baseSlug}-${counter++}`;
}Comparison Performance
For large lists, if you only need exact case-insensitive matching, standard string comparison methods might be slightly faster. Use compareStrings when you need robust handling of whitespace accents or specific comparison rules.
Troubleshooting
truncate cutting text shorter than expected
If wordBoundary is true (default), truncate will cut at the last space before the maxLength to avoid splitting a word. This means the resulting string might be shorter than maxLength. If you need exact character count, set wordBoundary: false.
compareStrings returns false for similar strings
Check if ignoreWhitespace options are set correctly. By default, compareStrings is precise. If one string has a newline or tab and the other doesn’t, they won’t match unless whitespace is normalized.
Note: similarity() is strictly Levenshtein distance. It is case-sensitive. For case-insensitive similarity, convert both strings to
lowercase before calling similarity().