StructFormer
Transformer-based model built to automate structured data transformation from one schema to another
StructFormer is a Transformer-based deep learning model designed to learn adjustments on structured data and generate corrective SQL statements. It is especially useful in enterprise workflows such as financial reconciliation, trade corrections, or regulatory compliance โ where minor data inconsistencies require systematic adjustments.
This model can be trained on a dataset of structured validation errors and their corresponding SQL fixes, and then generate valid SQL adjustments for new errors using beam search or greedy decoding.
๐ Problem it Solves
Structured datasets in enterprises often have issues such as missing values, mismatched fields, or incorrect classifications. Manual correction is time-consuming and prone to errors. StructFormer automates this by learning patterns in these adjustments and applying them to unseen errors, significantly reducing manual intervention.
๐ง Key Features
- Transformer encoder-decoder model trained on structured error logs
- SentencePiece tokenizer to handle out-of-vocabulary tokens (like dynamic trade IDs)
- Generates SQL
UPDATE
/INSERT
statements based on error descriptions - Supports windowed token training for longer sequences
- Can be trained incrementally and deployed via FastAPI or Hugging Face Spaces
๐๏ธ System Architecture

๐ก Technologies Used
- Python 3, TensorFlow/Keras
- SentencePiece tokenizer
- FastAPI for RESTful inference
- Jupyter + Matplotlib for evaluation
- GitHub Actions + Colab for experimentation
๐งช Training Results
Achieved 99% validation accuracy on a custom error-adjustment dataset with a vocabulary size of 3000 and positional sequence length of 100. Below are a few prediction samples:
๐งพ Input: TradeID=29216 AccountID=ACC1003 ErrorType=Incorrect Account Type
๐ฏ Expected: UPDATE Accounts SET AccountType='Savings' WHERE AccountID='ACC1003'; ...
๐งช Predicted: UPDATE Accounts SET AccountType='Checking' WHERE AccountID='ACC1003'; ...