Skip to content

Pricing Intelligence: Detecting Misleading Listing Prices Across 5 Markets

Marketplace Application

Online marketplaces routinely receive listings where the advertised price is misleading โ€” tied to financing deals, leasing conditions, or VAT exclusions that most buyers don't qualify for. This system automatically detects those listings at scale, protecting buyer trust and flagging them for review before they go live. Applicable to any marketplace where price transparency matters: automotive, property, rental, B2B, or e-commerce.

Project Summary

Domain: Online Marketplace / Pricing Transparency Role: ML Engineer (sole data scientist) Scope: 5 markets, 6 languages, production API + batch pipeline

20x Faster than Claude
>2x Cheaper at Scale
90โ€“95% Detection Rate
5 Markets, 6 Languages

Key Result: In-house model is 20x faster and >2x cheaper than Claude while maintaining comparable detection performance

The Problem

In online marketplaces, some listings advertise prices that come with conditions - financing requirements, leasing terms, special buyer restrictions, or VAT exclusions. These conditional prices are misleading for standard buyers who expect transparent pricing. Manually reviewing thousands of listings across multiple markets and languages was unsustainable.

The goal: Build an automated system that flags conditional pricing across 5 international markets with high accuracy and low latency.

Model Architecture

๐Ÿง  Encoders learn contextual word meaning with a tiny memory footprint โ€” enabling quick iteration and faster fine-tuning on a single 24 GB GPU
Listing
Description
โ†’
โœ‚ Trimmer
Extracts sentences with
pricing-relevant keywords
โ†’
๐Ÿค– mDeBERTa-v3-base
Encoder ยท 12 Layers
86M Parameters Multilingual
Used for: DE
๐Ÿค– DeBERTa-v3-large
Encoder ยท 24 Layers
304M Parameters
Used for: IT, AT, CA, BE
โ†’
๐Ÿ”ฒ Binary Classifier
Conditional
Non-Conditional
โŠž Multi Classifier
Financing ยท Incentives
Leasing ยท SpecialBuyers ยท Others
๐Ÿ’ก Why Two Heads? Fine-tuning both tasks simultaneously forces the encoder to learn richer shared representations โ€” the binary signal sharpens category boundaries, and the category signal anchors binary decisions. Single-task models for each head performed worse individually than this joint approach.

My Approach

Dual-Head Transformer Encoder Fine-Tuning

Rather than fine-tuning two separate models, I designed a single dual-head classifier that learns both tasks simultaneously in one forward pass:

  • Binary head: Conditional vs. Non-Conditional โ€” detects whether the advertised price is achievable by a standard buyer
  • Multi-label head: 7 condition categories โ€” Financing, Leasing, Incentives, Special Buyers, VAT Excluded, Other, OK

Joint fine-tuning improved accuracy on both tasks: the binary signal sharpens category boundaries, and the category signal anchors the binary decision. One model, two outputs, better performance than either standalone.

Language-Aware Model Selection

Rather than forcing a single model across all markets, I selected architecturally appropriate models:

Model Markets Rationale
DeBERTa V3 Large IT, AT, CA, BE Superior performance on English-centric and Romance language text
mDeBERTa V3 Base DE Better multilingual representations for German compound words and syntax

Feature Engineering

  • Country-specific keyword extraction: 70โ€“71 priority stems per market to extract pricing-relevant sentences before feeding into the model

Hybrid Inference: Transformers + LLM Fallback

For predictions where the model confidence fell below 0.8, the system routes to Claude via AWS Bedrock for validation.

Listing
โ†’
Transformer
Inference
โ†’
Confidence
Check
โ†’
โ‰ฅ 0.8
Return Prediction
(sub-second)
< 0.8
Claude Validation
(+2โ€“5s)
โ†’
Return Prediction

This hybrid approach optimises for speed on high-confidence predictions while using LLM reasoning only for genuinely ambiguous edge cases.

Automated Evaluation Pipeline

I built a stratified validation system using Claude as ground truth:

  1. Sample 250 listings per country (stratified by predicted class)
  2. Generate Claude labels with structured reasoning
  3. Evaluate transformer predictions against Claude ground truth
  4. Track per-country precision/recall over time

Production System

Deployment Pipeline

๐Ÿ” CI/CD Pipeline
FastAPI App
(classifier service)
โ†’
Docker Image
(containerised)
โ†’
GitHub Actions
(build & push)
โ†’
AWS ECR
(container registry)
โ˜ AWS Infrastructure (CloudFormation)
ECR Image
(pulled on deploy)
โ†’
CloudFormation
(stack launch)
โ†’
Auto Scaling
Group
(EC2 GPU)
โ†’
Target Group
(load balancer)
โ†’
Route 53
(DNS endpoint)

Real-Time API (FastAPI):

  • Accepts listing description, price, and country code
  • Returns classification with confidence score in <2s
  • Falls back to Claude for uncertain predictions
  • Includes business rule overrides (e.g., known leasing providers auto-flagged)

Monitoring: Full Datadog integration tracking prediction distributions, Claude API costs, conditional listing rates per market, and model latency.

Fine-Tuning Under Constraints

Fine-tuning a 304M parameter transformer encoder on a single 24 GB GPU required engineering around every memory bottleneck. Real constraints from the fine-tuning run:

Constraint Problem Solution
24 GB VRAM Model + activations exceeded memory Gradient checkpointing โ€” recompute activations on backward pass instead of storing them
Max batch size 8 Too noisy for stable convergence Gradient accumulation over 4 steps โ†’ effective batch of 64 without extra memory
Long input sequences Full 512-token inputs too slow Input truncation to 384 tokens โ€” cut training time significantly with negligible accuracy loss
FP32 precision Doubled memory for weights/activations FP16 mixed precision training throughout
DeBERTa-v3-large size 304M params barely fit Combined all four techniques together to make fine-tuning feasible on a single GPU

The disentangled attention mechanism in DeBERTa encodes each token using separate content and position embeddings โ€” this is what gives it strong contextual understanding with a relatively small parameter count compared to its performance.

Tech Stack

Python PyTorch HuggingFace Transformers DeBERTa V3 mDeBERTa V3 FastAPI AWS Bedrock Claude AWS Athena AWS S3 AWS EC2 (GPU) Papermill Datadog Docker

Why In-House Over Claude?

The core business case: fine-tuning a transformer model instead of routing everything through Claude delivered dramatic cost and speed gains at no meaningful accuracy cost.

Detection Performance

Method Detection Rate False Positive Rate
Claude 99% 1%
Transformer Model 90% โ€“ 95% 4% โ€“ 7%

A small accuracy trade-off in exchange for 20x faster inference and a cost curve that stays flat as volume scales.

Inference Speed

Method Listings per Day Inference Time
Claude 60K 2 โ€“ 3 hours
Transformer Model 60K ~10 minutes

Cost at Scale

Listings per Day GPU Cost p.a. (Transformer) Claude Cost p.a.
20K $14K $10K
40K $14K $23K
60K $14K $42K
80K $14K $56K
100K $14K $70K

The transformer's GPU cost is fixed at $14K/year regardless of volume. Claude's cost scales linearly โ€” at 100K listings/day the in-house model is 5x cheaper.

Key Takeaways

  • Right-sizing models matters: DeBERTa vs mDeBERTa selection per market improved accuracy without unnecessary compute
  • Confidence-based routing between fast local models and LLMs is a production pattern I now use everywhere
  • Country-specific feature engineering (keyword stems, text extraction) outperformed language-agnostic approaches
  • Automated evaluation pipelines with LLM-as-judge provide scalable quality assurance across markets