StructFormer

Transformer-based model built to automate structured data transformation from one schema to another

StructFormer is a Transformer-based deep learning model designed to learn adjustments on structured data and generate corrective SQL statements. It is especially useful in enterprise workflows such as financial reconciliation, trade corrections, or regulatory compliance โ€” where minor data inconsistencies require systematic adjustments.

This model can be trained on a dataset of structured validation errors and their corresponding SQL fixes, and then generate valid SQL adjustments for new errors using beam search or greedy decoding.


๐Ÿ” Problem it Solves

Structured datasets in enterprises often have issues such as missing values, mismatched fields, or incorrect classifications. Manual correction is time-consuming and prone to errors. StructFormer automates this by learning patterns in these adjustments and applying them to unseen errors, significantly reducing manual intervention.


๐Ÿง  Key Features

  • Transformer encoder-decoder model trained on structured error logs
  • SentencePiece tokenizer to handle out-of-vocabulary tokens (like dynamic trade IDs)
  • Generates SQL UPDATE / INSERT statements based on error descriptions
  • Supports windowed token training for longer sequences
  • Can be trained incrementally and deployed via FastAPI or Hugging Face Spaces

๐Ÿ—๏ธ System Architecture

The StructFormer pipeline involves tokenizing structured errors, encoding them, decoding corrective sequences, and detokenizing them into executable SQL statements.

๐Ÿ’ก Technologies Used

  • Python 3, TensorFlow/Keras
  • SentencePiece tokenizer
  • FastAPI for RESTful inference
  • Jupyter + Matplotlib for evaluation
  • GitHub Actions + Colab for experimentation

๐Ÿงช Training Results

Achieved 99% validation accuracy on a custom error-adjustment dataset with a vocabulary size of 3000 and positional sequence length of 100. Below are a few prediction samples:

๐Ÿงพ Input: TradeID=29216 AccountID=ACC1003 ErrorType=Incorrect Account Type
๐ŸŽฏ Expected: UPDATE Accounts SET AccountType='Savings' WHERE AccountID='ACC1003'; ...
๐Ÿงช Predicted: UPDATE Accounts SET AccountType='Checking' WHERE AccountID='ACC1003'; ...

References