← Back to Blog
TutorialJanuary 25, 20259 min read

How to Parse Bank Transaction Descriptions (with Code Examples)

Learn how to extract merchant names from cryptic bank transaction descriptions using regex, NLP, and enrichment APIs — with Python and Node.js examples.

If you've ever tried to build a personal finance app or expense tracker, you know the pain: bank transaction descriptions look like CHECKCARD 0315 AMZN MKTP US*2K1AB0C9Z AMZN.COM/BILL WA instead of simply "Amazon." Parsing these descriptions into clean merchant names is one of the hardest problems in fintech development.

The Problem: Why Bank Descriptions Are a Mess

Banks don't standardize transaction descriptions. What you get depends on the payment processor, the bank, and the merchant's terminal configuration. Here are real examples:

Raw DescriptionActual Merchant
CHECKCARD 0315 AMZN MKTP US*2K1AB0C9ZAmazon
TST* SWEETGREEN - DOWNT WASHINGTON DCSweetgreen
SQ *THE COFFEE BEAN 1234 Los Angeles CAThe Coffee Bean
UBER *TRIP HELP.UBER.COM CAUber
PY *SPOTIFY USA 877-778-8672 NYSpotify
DD *DOORDASH PANERA BREAD 800-958-3...DoorDash (Panera Bread)

Notice the patterns: processor prefixes (SQ *, TST*, DD *), trailing location data, phone numbers, reference codes, and truncated text. No two banks format these the same way.

Approach 1: Regex (Quick & Dirty)

The simplest approach is to strip known prefixes and suffixes with regular expressions. This gets you 60-70% of the way there for common merchants.

Python Example

import re

def parse_transaction(description: str) -> str:
    """Basic regex-based transaction parser."""
    text = description.upper().strip()

    # Remove common prefixes
    prefixes = [
        r'^CHECKCARD \d+ ',
        r'^POS (DEBIT |PURCHASE )?',
        r'^PURCHASE AUTHORIZED ON \d+/\d+ ',
        r'^SQ \*',
        r'^TST\* ?',
        r'^DD \*',
        r'^PY \*',
        r'^UBER \*',
        r'^LYFT \*',
        r'^AMZN MKTP ',
    ]
    for prefix in prefixes:
        text = re.sub(prefix, '', text)

    # Remove trailing location (STATE abbreviation + ZIP)
    text = re.sub(r'\s+[A-Z]{2}\s*\d{5}(-\d{4})?$', '', text)

    # Remove trailing phone numbers
    text = re.sub(r'\s+\d{3}[\-.]\d{3}[\-.]\d{4}.*$', '', text)

    # Remove trailing reference codes
    text = re.sub(r'\s+[A-Z0-9*#]{6,}$', '', text)

    # Remove trailing URLs
    text = re.sub(r'\s+\S+\.(COM|NET|ORG)\S*', '', text)

    return text.strip().title()

# Examples
print(parse_transaction("CHECKCARD 0315 AMZN MKTP US*2K1AB0C9Z"))
# => "Us" ← Not great!
print(parse_transaction("SQ *THE COFFEE BEAN 1234 Los Angeles CA"))
# => "The Coffee Bean 1234 Los Angeles" ← Close but noisy

The problem is obvious: regex can't understand context. It doesn't know that "AMZN MKTP" means Amazon, or that "1234" is a store number and not part of the name. You end up maintaining an ever-growing list of patterns that still miss edge cases.

Node.js Example

function parseTransaction(description) {
  let text = description.toUpperCase().trim();

  // Remove common prefixes
  const prefixes = [
    /^CHECKCARD \d+ /,
    /^POS (DEBIT |PURCHASE )?/,
    /^SQ \*/,
    /^TST\* ?/,
    /^DD \*/,
    /^PY \*/,
  ];
  for (const prefix of prefixes) {
    text = text.replace(prefix, '');
  }

  // Remove trailing state + zip
  text = text.replace(/\s+[A-Z]{2}\s*\d{5}(-\d{4})?$/, '');

  // Remove trailing phone numbers
  text = text.replace(/\s+\d{3}[\-.]\d{3}[\-.]\d{4}.*$/, '');

  return text.trim();
}

console.log(parseTransaction("SQ *THE COFFEE BEAN 1234 Los Angeles CA 90001"));
// => "THE COFFEE BEAN 1234 LOS ANGELES" — still noisy

Approach 2: Lookup Table + Fuzzy Matching

A better approach is combining regex cleanup with a merchant database and fuzzy string matching. This handles known merchants well but still fails on local businesses.

from fuzzywuzzy import fuzz

KNOWN_MERCHANTS = {
    "AMZN": "Amazon",
    "AMAZON": "Amazon",
    "STARBUCKS": "Starbucks",
    "UBER": "Uber",
    "LYFT": "Lyft",
    "NETFLIX": "Netflix",
    "SPOTIFY": "Spotify",
    # ... hundreds more needed
}

def match_merchant(cleaned_text: str) -> str | None:
    for pattern, name in KNOWN_MERCHANTS.items():
        if pattern in cleaned_text:
            return name
    # Fuzzy fallback
    best_match = None
    best_score = 0
    for pattern, name in KNOWN_MERCHANTS.items():
        score = fuzz.partial_ratio(cleaned_text, pattern)
        if score > best_score and score > 80:
            best_score = score
            best_match = name
    return best_match

This works for major chains, but you need to maintain a database of thousands of merchants. And you still can't handle local businesses, international merchants, or new companies.

Approach 3: Use a Transaction Enrichment API

The most accurate and maintainable solution is to use a dedicated transaction enrichment API. These services maintain massive merchant databases, use AI models trained on billions of transactions, and handle all the edge cases for you.

Python — Using Easy Enrichment

import requests

API_KEY = "your_api_key"

def enrich_transaction(description: str, amount: float = None) -> dict:
    """Parse a bank transaction using Easy Enrichment API."""
    response = requests.post(
        "https://api.easyenrichment.com/enrich",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "description": description,
            "amount": amount,
            "currency": "USD"
        }
    )
    return response.json()

# Try it
result = enrich_transaction("CHECKCARD 0315 AMZN MKTP US*2K1AB0C9Z", 49.99)
print(result["merchant_name"])  # "Amazon"
print(result["category"])       # "Shopping"
print(result["logo_url"])       # "https://logo.easyenrichment.com/amazon.com"
print(result["mcc_code"])       # "5942"
print(result["confidence"])     # 0.97

Node.js — Using Easy Enrichment

const API_KEY = 'your_api_key';

async function enrichTransaction(description, amount) {
  const response = await fetch('https://api.easyenrichment.com/enrich', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      description,
      amount,
      currency: 'USD',
    }),
  });
  return response.json();
}

// Usage
const result = await enrichTransaction(
  'SQ *THE COFFEE BEAN 1234 Los Angeles CA',
  5.75
);
console.log(result.merchant_name); // "The Coffee Bean & Tea Leaf"
console.log(result.category);      // "Food & Drink"
console.log(result.subcategory);   // "Coffee Shops"

Comparing the Approaches

CriteriaRegexLookup + FuzzyEnrichment API
Accuracy~40%~70%~95%+
Local businessesFailsFailsWorks
Returns categoriesNoManualYes
Returns logosNoNoYes
Maintenance effortHighHighNone
CostFreeFree$0.002/tx

When to Use Each Approach

  • Regex only: Quick prototyping, internal tools where accuracy doesn't matter much.
  • Lookup + fuzzy matching: When you only deal with a small set of known merchants (e.g., internal expense tracking for a corporate card).
  • Enrichment API: Any user-facing application where you need accurate merchant names, categories, and logos. The cost per transaction is negligible compared to the engineering time saved.

Stop Writing Regex — Use Easy Enrichment

Parse any bank transaction description into a clean merchant name, category, logo, and more. Start free with 500 requests/month.