aare.ai - Domain-Specific Language Models

What is a DSLM?

A Domain-Specific Language Model (DSLM) is a language model pre-trained or fine-tuned on domain-specific corpora. DSLMs learn the vocabulary, terminology, and linguistic patterns unique to a particular field (healthcare, finance, legal, etc.), enabling higher accuracy on domain tasks than general-purpose models.

At Aare, we use DSLMs as the extraction layer in the Aare Edge verification pipeline. They convert unstructured text into structured facts that can be formally verified by Z3 Lite.

Input

Unstructured Text

→

Extract
DSLM

→

Output

Typed Entities

→

Verify
Z3 Lite

Available Models

Pre-trained DSLMs for common compliance domains. Each model is optimized for on-device inference with minimal footprint.

Available Now

HIPAA PHI Detector

Healthcare / Privacy

Detects all 18 HIPAA Safe Harbor PHI categories in clinical text. Handles voice transcriptions, OCR output, and free-form medical notes.

Base Model DistilBERT

Parameters 67M

CoreML Size ~127 MB

Inference <50ms

18 Safe Harbor categories

Coming Soon

Fair Lending Extractor

Financial Services

Extracts loan parameters, applicant data, and decision factors from underwriting text. Built for ECOA and fair lending compliance.

Base Model DistilBERT

Parameters 67M

Status In Development

LOAN_AMOUNT CREDIT_SCORE DTI_RATIO INCOME DECISION

Coming Soon

General PII Detector

Privacy / GDPR / CCPA

Broad PII detection for privacy compliance across industries. Covers personal identifiers, financial data, and biometric markers.

Base Model DistilBERT

Parameters 67M

Status In Development

NAME EMAIL PHONE SSN PASSPORT BIOMETRIC

Custom Training

Your Domain

Any Compliance Domain

Need entity extraction for a specialized domain? We train custom DSLMs on your data with your entity schema.

Base Model Your Choice

Training Data Your Data

Entities Your Schema

CUSTOM_ENTITY_1 CUSTOM_ENTITY_2 ...

Technical Specifications

All Aare DSLMs are designed for edge deployment: small enough to run on mobile devices, fast enough for real-time inference, accurate enough for compliance-critical applications.

Specification	HIPAA DSLM	Standard DSLMs
Architecture	DistilBERT (6-layer transformer)	DistilBERT or Phi-3-mini
Parameters	67M	67M - 3.8B
Model Format	CoreML (.mlpackage)	CoreML, ONNX, TensorFlow Lite
Quantization	FP16	FP16 / INT8 / INT4
Model Size	~127 MB	50 MB - 500 MB
Inference Time	<50ms	20-100ms
Max Sequence Length	512 tokens	512-2048 tokens
Tokenizer	WordPiece (BERT vocab)	WordPiece or BPE
Training Method	Fine-tuned NER with BIO tagging	Fine-tuned NER / LoRA

Why DSLMs Over General LLMs?

10x Smaller Footprint

A 67M parameter DSLM fits in ~127MB. GPT-4 class models require 100GB+. DSLMs run on phones, watches, and embedded devices.

100x Faster Inference

Sub-50ms inference vs. seconds for cloud LLM calls. Real-time entity extraction without network latency.

Higher Accuracy on Domain

Fine-tuned on domain-specific data, DSLMs outperform general models on their target entity types.

Deterministic Output

No temperature, no sampling variation. Same input always produces same entities. Critical for compliance.

No Data Leaves Device

All inference runs locally. PHI, PII, and sensitive data never transmitted. True on-device privacy.

Works Offline

No network required. Extract entities in airgapped environments, airplanes, or areas with poor connectivity.

Licensing

Aare DSLMs are available under commercial license. Model weights are proprietary and optimized for specific compliance domains. The Aare Edge SDK (inference runtime, Z3 Lite, tokenizer) is MIT licensed and open source.

Evaluation License: Test DSLMs in your environment with sample data
Production License: Deploy DSLMs in your applications with volume pricing
Custom Training: We train DSLMs on your data for your specific entity schema

Aare Domain-Specific Language Models

What is a DSLM?

Available Models

HIPAA PHI Detector

Fair Lending Extractor

General PII Detector

Your Domain

Technical Specifications

Why DSLMs Over General LLMs?

10x Smaller Footprint

100x Faster Inference

Higher Accuracy on Domain

Deterministic Output

No Data Leaves Device

Works Offline

Licensing

Get Started with DSLMs