aare.ai_

Aare Domain-Specific Language Models

Purpose-built transformer models for compliance-critical entity extraction. Small footprint, high accuracy, runs entirely on-device.

DistilBERT / Phi-3 CoreML / ONNX 4-bit Quantization On-Device Inference

What is a DSLM?

A Domain-Specific Language Model (DSLM) is a language model pre-trained or fine-tuned on domain-specific corpora. DSLMs learn the vocabulary, terminology, and linguistic patterns unique to a particular field (healthcare, finance, legal, etc.), enabling higher accuracy on domain tasks than general-purpose models.

At Aare, we use DSLMs as the extraction layer in the Aare Edge verification pipeline. They convert unstructured text into structured facts that can be formally verified by Z3 Lite.

Input
Unstructured Text
Extract
DSLM
Output
Typed Entities
Verify
Z3 Lite

Available Models

Pre-trained DSLMs for common compliance domains. Each model is optimized for on-device inference with minimal footprint.

Coming Soon

Fair Lending Extractor

Financial Services

Extracts loan parameters, applicant data, and decision factors from underwriting text. Built for ECOA and fair lending compliance.

Base Model DistilBERT
Parameters 67M
Status In Development
LOAN_AMOUNT CREDIT_SCORE DTI_RATIO INCOME DECISION
Coming Soon

General PII Detector

Privacy / GDPR / CCPA

Broad PII detection for privacy compliance across industries. Covers personal identifiers, financial data, and biometric markers.

Base Model DistilBERT
Parameters 67M
Status In Development
NAME EMAIL PHONE SSN PASSPORT BIOMETRIC
Custom Training

Your Domain

Any Compliance Domain

Need entity extraction for a specialized domain? We train custom DSLMs on your data with your entity schema.

Base Model Your Choice
Training Data Your Data
Entities Your Schema
CUSTOM_ENTITY_1 CUSTOM_ENTITY_2 ...

Technical Specifications

All Aare DSLMs are designed for edge deployment: small enough to run on mobile devices, fast enough for real-time inference, accurate enough for compliance-critical applications.

Specification HIPAA DSLM Standard DSLMs
Architecture DistilBERT (6-layer transformer) DistilBERT or Phi-3-mini
Parameters 67M 67M - 3.8B
Model Format CoreML (.mlpackage) CoreML, ONNX, TensorFlow Lite
Quantization FP16 FP16 / INT8 / INT4
Model Size ~127 MB 50 MB - 500 MB
Inference Time <50ms 20-100ms
Max Sequence Length 512 tokens 512-2048 tokens
Tokenizer WordPiece (BERT vocab) WordPiece or BPE
Training Method Fine-tuned NER with BIO tagging Fine-tuned NER / LoRA

Why DSLMs Over General LLMs?

10x Smaller Footprint

A 67M parameter DSLM fits in ~127MB. GPT-4 class models require 100GB+. DSLMs run on phones, watches, and embedded devices.

100x Faster Inference

Sub-50ms inference vs. seconds for cloud LLM calls. Real-time entity extraction without network latency.

Higher Accuracy on Domain

Fine-tuned on domain-specific data, DSLMs outperform general models on their target entity types.

Deterministic Output

No temperature, no sampling variation. Same input always produces same entities. Critical for compliance.

No Data Leaves Device

All inference runs locally. PHI, PII, and sensitive data never transmitted. True on-device privacy.

Works Offline

No network required. Extract entities in airgapped environments, airplanes, or areas with poor connectivity.

Licensing

Aare DSLMs are available under commercial license. Model weights are proprietary and optimized for specific compliance domains. The Aare Edge SDK (inference runtime, Z3 Lite, tokenizer) is MIT licensed and open source.

Get Started with DSLMs

SDK is MIT licensed. Model weights under commercial license.

GitHub: https://github.com/aare-ai