How to Parse Bank Transaction Descriptions (with Code Examples)
Learn how to extract merchant names from cryptic bank transaction descriptions using regex, NLP, and enrichment APIs — with Python and Node.js examples.
If you've ever tried to build a personal finance app or expense tracker, you know the pain: bank transaction descriptions look like CHECKCARD 0315 AMZN MKTP US*2K1AB0C9Z AMZN.COM/BILL WA instead of simply "Amazon." Parsing these descriptions into clean merchant names is one of the hardest problems in fintech development.
The Problem: Why Bank Descriptions Are a Mess
Banks don't standardize transaction descriptions. What you get depends on the payment processor, the bank, and the merchant's terminal configuration. Here are real examples:
| Raw Description | Actual Merchant |
|---|---|
| CHECKCARD 0315 AMZN MKTP US*2K1AB0C9Z | Amazon |
| TST* SWEETGREEN - DOWNT WASHINGTON DC | Sweetgreen |
| SQ *THE COFFEE BEAN 1234 Los Angeles CA | The Coffee Bean |
| UBER *TRIP HELP.UBER.COM CA | Uber |
| PY *SPOTIFY USA 877-778-8672 NY | Spotify |
| DD *DOORDASH PANERA BREAD 800-958-3... | DoorDash (Panera Bread) |
Notice the patterns: processor prefixes (SQ *, TST*, DD *), trailing location data, phone numbers, reference codes, and truncated text. No two banks format these the same way.
Approach 1: Regex (Quick & Dirty)
The simplest approach is to strip known prefixes and suffixes with regular expressions. This gets you 60-70% of the way there for common merchants.
Python Example
import re
def parse_transaction(description: str) -> str:
"""Basic regex-based transaction parser."""
text = description.upper().strip()
# Remove common prefixes
prefixes = [
r'^CHECKCARD \d+ ',
r'^POS (DEBIT |PURCHASE )?',
r'^PURCHASE AUTHORIZED ON \d+/\d+ ',
r'^SQ \*',
r'^TST\* ?',
r'^DD \*',
r'^PY \*',
r'^UBER \*',
r'^LYFT \*',
r'^AMZN MKTP ',
]
for prefix in prefixes:
text = re.sub(prefix, '', text)
# Remove trailing location (STATE abbreviation + ZIP)
text = re.sub(r'\s+[A-Z]{2}\s*\d{5}(-\d{4})?$', '', text)
# Remove trailing phone numbers
text = re.sub(r'\s+\d{3}[\-.]\d{3}[\-.]\d{4}.*$', '', text)
# Remove trailing reference codes
text = re.sub(r'\s+[A-Z0-9*#]{6,}$', '', text)
# Remove trailing URLs
text = re.sub(r'\s+\S+\.(COM|NET|ORG)\S*', '', text)
return text.strip().title()
# Examples
print(parse_transaction("CHECKCARD 0315 AMZN MKTP US*2K1AB0C9Z"))
# => "Us" ← Not great!
print(parse_transaction("SQ *THE COFFEE BEAN 1234 Los Angeles CA"))
# => "The Coffee Bean 1234 Los Angeles" ← Close but noisyThe problem is obvious: regex can't understand context. It doesn't know that "AMZN MKTP" means Amazon, or that "1234" is a store number and not part of the name. You end up maintaining an ever-growing list of patterns that still miss edge cases.
Node.js Example
function parseTransaction(description) {
let text = description.toUpperCase().trim();
// Remove common prefixes
const prefixes = [
/^CHECKCARD \d+ /,
/^POS (DEBIT |PURCHASE )?/,
/^SQ \*/,
/^TST\* ?/,
/^DD \*/,
/^PY \*/,
];
for (const prefix of prefixes) {
text = text.replace(prefix, '');
}
// Remove trailing state + zip
text = text.replace(/\s+[A-Z]{2}\s*\d{5}(-\d{4})?$/, '');
// Remove trailing phone numbers
text = text.replace(/\s+\d{3}[\-.]\d{3}[\-.]\d{4}.*$/, '');
return text.trim();
}
console.log(parseTransaction("SQ *THE COFFEE BEAN 1234 Los Angeles CA 90001"));
// => "THE COFFEE BEAN 1234 LOS ANGELES" — still noisyApproach 2: Lookup Table + Fuzzy Matching
A better approach is combining regex cleanup with a merchant database and fuzzy string matching. This handles known merchants well but still fails on local businesses.
from fuzzywuzzy import fuzz
KNOWN_MERCHANTS = {
"AMZN": "Amazon",
"AMAZON": "Amazon",
"STARBUCKS": "Starbucks",
"UBER": "Uber",
"LYFT": "Lyft",
"NETFLIX": "Netflix",
"SPOTIFY": "Spotify",
# ... hundreds more needed
}
def match_merchant(cleaned_text: str) -> str | None:
for pattern, name in KNOWN_MERCHANTS.items():
if pattern in cleaned_text:
return name
# Fuzzy fallback
best_match = None
best_score = 0
for pattern, name in KNOWN_MERCHANTS.items():
score = fuzz.partial_ratio(cleaned_text, pattern)
if score > best_score and score > 80:
best_score = score
best_match = name
return best_matchThis works for major chains, but you need to maintain a database of thousands of merchants. And you still can't handle local businesses, international merchants, or new companies.
Approach 3: Use a Transaction Enrichment API
The most accurate and maintainable solution is to use a dedicated transaction enrichment API. These services maintain massive merchant databases, use AI models trained on billions of transactions, and handle all the edge cases for you.
Python — Using Easy Enrichment
import requests
API_KEY = "your_api_key"
def enrich_transaction(description: str, amount: float = None) -> dict:
"""Parse a bank transaction using Easy Enrichment API."""
response = requests.post(
"https://api.easyenrichment.com/enrich",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"description": description,
"amount": amount,
"currency": "USD"
}
)
return response.json()
# Try it
result = enrich_transaction("CHECKCARD 0315 AMZN MKTP US*2K1AB0C9Z", 49.99)
print(result["merchant_name"]) # "Amazon"
print(result["category"]) # "Shopping"
print(result["logo_url"]) # "https://logo.easyenrichment.com/amazon.com"
print(result["mcc_code"]) # "5942"
print(result["confidence"]) # 0.97Node.js — Using Easy Enrichment
const API_KEY = 'your_api_key';
async function enrichTransaction(description, amount) {
const response = await fetch('https://api.easyenrichment.com/enrich', {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
description,
amount,
currency: 'USD',
}),
});
return response.json();
}
// Usage
const result = await enrichTransaction(
'SQ *THE COFFEE BEAN 1234 Los Angeles CA',
5.75
);
console.log(result.merchant_name); // "The Coffee Bean & Tea Leaf"
console.log(result.category); // "Food & Drink"
console.log(result.subcategory); // "Coffee Shops"Comparing the Approaches
| Criteria | Regex | Lookup + Fuzzy | Enrichment API |
|---|---|---|---|
| Accuracy | ~40% | ~70% | ~95%+ |
| Local businesses | Fails | Fails | Works |
| Returns categories | No | Manual | Yes |
| Returns logos | No | No | Yes |
| Maintenance effort | High | High | None |
| Cost | Free | Free | $0.002/tx |
When to Use Each Approach
- Regex only: Quick prototyping, internal tools where accuracy doesn't matter much.
- Lookup + fuzzy matching: When you only deal with a small set of known merchants (e.g., internal expense tracking for a corporate card).
- Enrichment API: Any user-facing application where you need accurate merchant names, categories, and logos. The cost per transaction is negligible compared to the engineering time saved.
Stop Writing Regex — Use Easy Enrichment
Parse any bank transaction description into a clean merchant name, category, logo, and more. Start free with 500 requests/month.