top of page

Documentation: Life Insurance - LDEx Data Quality Analysis Package: How the Artifacts Connect

  • Dataprepr Team
  • Nov 22
  • 6 min read

The Complete Picture: Artifact Relationships


Think of this toolkit as a complete data quality system where each file plays a specific role. Everything connects to help you answer one question: "Is my LDEx enrollment data good enough to load into production?" The toolkit gives you an objective, comprehensive, repeatable answer to that question - every single time.


Here's how they all connect:



                    ┌─────────────────────────────┐
                    │   YOUR LDEx XML FILE        │
                    │  (enrollment data)          │
                    └──────────┬──────────────────┘
                               │
                               ▼
                    ┌─────────────────────────────┐
                    │  ldex_data_quality_         │
                    │  analysis.R                 │◄─────┐
                    │  (THE ANALYZER)             │      │
                    └──────────┬──────────────────┘      │
                               │                         │
                               ▼                         │
                    ┌─────────────────────────────┐      │
                    │  Data Quality Report.xlsx   │      │
                    │  (THE OUTPUT)               │      │
                    └─────────────────────────────┘      │
                                                         │
         Documentation & Support Files ─────────────────┘
         • README.md ...................... Quick start guide
         • DATA_QUALITY_ANALYSIS_GUIDE.md . Full methodology
         • PACKAGE_INDEX.md ............... File reference
         • ADVANCED_LLM_PROMPT.md ......... AI analysis helper
         • sample_ldex_life_enrollment.xml. Example data
         • LDEx_Data_Quality_Report_Sample.xlsx ... Example output
         • generate_sample_report.py ...... Demo generator

The Three Core Components


1. THE INPUT → sample_ldex_life_enrollment.xml

  • What it is: Example LDEx enrollment file

  • Purpose: Shows you what data format the analyzer expects

  • Links to: The R script reads this (or your real file)

  • How to use: Replace this with your actual LDEx files


2. THE ANALYZER → ldex_data_quality_analysis.R

  • What it is: 700+ lines of R code that does all the work

  • Purpose: Reads XML, runs 30+ checks, generates Excel report

  • Links to:

    • Reads: Your LDEx XML file

    • Outputs: Excel report

    • References: All documentation files for methodology

  • How to use: Configure file paths, then run it


3. THE OUTPUT → LDEx_Data_Quality_Report.xlsx

  • What it is: 10-sheet comprehensive analysis report

  • Purpose: Shows all findings in business-friendly format

  • Links to: Generated by the R script

  • How to use: Open in Excel, review findings, take action


The Supporting Cast


Documentation Trio

README.md (The Quick Start)

  • Role: Your first stop

  • Links to: Points you to all other files

  • Use when: You're getting started or need quick reference

DATA_QUALITY_ANALYSIS_GUIDE.md (The Deep Dive)

  • Role: Complete methodology explanation

  • Links to: References the R script code blocks

  • Use when: You need to understand HOW it works

PACKAGE_INDEX.md (The Reference)

  • Role: Complete file catalog and navigation

  • Links to: Everything - it's the master index

  • Use when: You need to find something specific


Demo Files

LDEx_Data_Quality_Report_Sample.xlsx (The Preview)

  • Role: Shows you what the output looks like

  • Links to: This is what the R script produces

  • Use when: You want to see results WITHOUT running code

generate_sample_report.py (The Generator)

  • Role: Creates the sample Excel report

  • Links to: Generates the sample report file

  • Use when: You want to recreate the demo report


AI Helper

ADVANCED_LLM_PROMPT.md (The AI Guide)

  • Role: Instructions for using AI to analyze the report

  • Links to: Designed to work with the Excel output

  • Use when: You want AI assistance interpreting results


Three Ways to Use This Toolkit

OPTION 1: "Just Show Me" (5 minutes)

For business users who want to see what's possible

1. Open: LDEx_Data_Quality_Report_Sample.xlsx
2. Look at: Sheet 1 (Executive Summary)
3. Notice: Overall score, grade, key findings
4. Done! You've seen what the output looks like

Files used:

  • ✓ LDEx_Data_Quality_Report_Sample.xlsx

  • README.md (for interpretation)


OPTION 2: "Run It on My Data" (30 minutes)

For analysts who want to analyze their own LDEx files

Step 1: Install R and packages
   → Follow instructions in README.md
   
Step 2: Prepare your LDEx file
   → Place your XML file in a known location
   → Example: /data/my_enrollment.xml
   
Step 3: Configure the R script
   → Open: ldex_data_quality_analysis.R
   → Line 36: Set ldex_file path to YOUR file
   → Line 39: Set output_excel name
   
Step 4: Run the analysis
   → In R: source("ldex_data_quality_analysis.R")
   → Wait 5 seconds to 5 minutes (depending on file size)
   
Step 5: Review your report
   → Open: LDEx_Data_Quality_Report.xlsx
   → Start with Executive Summary sheet
   → Investigate any RED (fail) items

Files used:

  • ✓ ldex_data_quality_analysis.R (THE MAIN SCRIPT)

  • ✓ Your LDEx XML file

  • README.md (for setup instructions)

  • ✓ DATA_QUALITY_ANALYSIS_GUIDE.md (for understanding results)


Files produced:

  • ✓ LDEx_Data_Quality_Report.xlsx (YOUR RESULTS)


OPTION 3: "Customize It" (2+ hours)

For developers who want to extend the toolkit

Step 1: Understand the baseline
   → Read: DATA_QUALITY_ANALYSIS_GUIDE.md (full methodology)
   → Study: ldex_data_quality_analysis.R (code structure)
   → Review: sample_ldex_life_enrollment.xml (data format)
   
Step 2: Run baseline analysis
   → Follow Option 2 steps above
   → Understand what the default analysis does
   
Step 3: Add your custom rules
   → Reference: DATA_QUALITY_ANALYSIS_GUIDE.md (Customization Guide section)
   → Examples included for:
     - New validation rules
     - Custom anomalies
     - Additional business rules
   
Step 4: Test your changes
   → Run on sample data first
   → Verify new checks appear in output
   
Step 5: Deploy to production
   → Document your changes
   → Add to your data pipeline

Files used:

  • ✓ ALL of them! You're going deep


The Information Flow

How Data Moves Through the System

1. YOU provide ────────► LDEx XML file
                         (Your enrollment data)
                                │
                                ▼
2. R Script reads ─────► Parses XML using xml2 library
                         Extracts: Members, Coverages, Dependents
                                │
                                ▼
3. Analysis runs ──────► Completeness Check
                         Validation Rules (10+)
                         Anomaly Detection (8+)
                         Business Rules (5+)
                         Statistical Profiling
                                │
                                ▼
4. Scoring happens ────► Weighted score calculation
                         Component scores ───► Score (0-100%)
                         Grade assignment ───► Letter grade (A-D)
                                │
                                ▼
5. Excel export ───────► 10 sheets created
                         Color coding applied
                         Formulas calculated
                                │
                                ▼
6. YOU receive ────────► LDEx_Data_Quality_Report.xlsx
                         (Ready to review and act on)

File Dependency Map

What Needs What


To RUN the analysis, you need:

  • ldex_data_quality_analysis.R ← THE ESSENTIAL FILE

  • Your LDEx XML file

  • R installed with required packages


To UNDERSTAND the analysis, you need:

  • README.md (quick start)

  • DATA_QUALITY_ANALYSIS_GUIDE.md (full details)


To SEE example output, you need:

  • LDEx_Data_Quality_Report_Sample.xlsx


To NAVIGATE everything, you need:

  • PACKAGE_INDEX.md


To USE AI for analysis, you need:

  • ADVANCED_LLM_PROMPT.md

  • Your actual Excel report


Common Usage Workflows

Workflow 1: First-Time User

1. README.md ..................... Understand what this is
2. LDEx_Data_Quality_Report_Sample.xlsx ... See example output
3. DATA_QUALITY_ANALYSIS_GUIDE.md . Learn the methodology
4. Install R and packages ........ Get environment ready
5. Run on sample_ldex_life_enrollment.xml ... Test it works
6. Run on YOUR data .............. Analyze real files

Workflow 2: Regular User

1. Place new LDEx file in folder
2. Run: source("ldex_data_quality_analysis.R")
3. Open: LDEx_Data_Quality_Report.xlsx
4. Check: Sheet 4 (Validation Results) for any FAILS
5. Check: Sheet 5 (Anomaly Detection) for HIGH severity
6. Make decisions based on findings

Workflow 3: Developer Customizing

1. Review: Current validation rules in R script
2. Identify: What additional checks you need
3. Reference: Customization examples in GUIDE
4. Code: Add new rules to R script
5. Test: Run on sample data
6. Document: Update README with your additions
7. Deploy: To production pipeline

Workflow 4: Executive Review

1. Receive: Excel report from analyst
2. Open: Sheet 1 (Executive Summary)
3. Look at: Overall Score and Grade
4. Decide: 
   - Grade A/B → Approve for loading
   - Grade C → Review with team
   - Grade D → Reject, send back to source

Key Connections to Remember

1. R Script ↔ XML File

The R script's first 100 lines are dedicated to:
- Finding your XML file (line 36)
- Parsing it correctly (lines 95-99)
- Extracting data safely (lines 46-54)

CONNECTION: You MUST update line 36 with your file path

2. R Script ↔ Excel Output

The R script's last 200 lines create:
- All 10 Excel sheets
- Color formatting
- Data validation rules
- Summary calculations

CONNECTION: Line 39 determines output file name

3. Documentation ↔ Code

Every validation rule in the R script is explained in:
- DATA_QUALITY_ANALYSIS_GUIDE.md (lines 52-86)
- With examples of what passes/fails
- With business rationale

CONNECTION: Section numbers match rule IDs

4. Sample Files ↔ Real Usage

sample_ldex_life_enrollment.xml shows the structure
Your real file should match this structure
Otherwise, namespace errors occur

CONNECTION: Same XML structure = analysis works

The "Don't Forget" Checklist

When using this toolkit, remember these connections:

The R script needs the XML file path (line 36)

The R script creates the Excel output (line 39)

The documentation explains the code (reference guide)

The sample output shows expected results (preview)

The README is your quick reference (start here)

The GUIDE has customization examples (for extending)

The sample XML is your format reference (structure guide)


Quick Reference: "Which File Do I Need?"

I want to...Use this file
Get started quickly         README.md
See example output          LDEx_Data_Quality_Report_Sample.xlsx
Run analysis                ldex_data_quality_analysis.R
Understand methodology      DATA_QUALITY_ANALYSIS_GUIDE.mdFind a specific feature            PACKAGE_INDEX.md
Customize validation rules  DATA_QUALITY_ANALYSIS_GUIDE.md + R scriptUse AI for analysis   ADVANCED_LLM_PROMPT.md
Understand XML format       sample_ldex_life_enrollment.xml
Regenerate sample report    generate_sample_report.py

The Bottom Line


Think of it this way:

  • Documentation files = Your instruction manual

  • R script = The engine that does the work

  • Sample files = Training wheels to learn

  • Your XML file = The fuel you put in

  • Excel output = The insights you get out



For more help, reach out to info@dataprepr.ai


 
 
 

Comments


bottom of page