Documentation: Life Insurance - LDEx Data Quality Analysis Package: How the Artifacts Connect
- Dataprepr Team
- Nov 22
- 6 min read
The Complete Picture: Artifact Relationships
Think of this toolkit as a complete data quality system where each file plays a specific role. Everything connects to help you answer one question: "Is my LDEx enrollment data good enough to load into production?" The toolkit gives you an objective, comprehensive, repeatable answer to that question - every single time.
Here's how they all connect:
┌─────────────────────────────┐
│ YOUR LDEx XML FILE │
│ (enrollment data) │
└──────────┬──────────────────┘
│
▼
┌─────────────────────────────┐
│ ldex_data_quality_ │
│ analysis.R │◄─────┐
│ (THE ANALYZER) │ │
└──────────┬──────────────────┘ │
│ │
▼ │
┌─────────────────────────────┐ │
│ Data Quality Report.xlsx │ │
│ (THE OUTPUT) │ │
└─────────────────────────────┘ │
│
Documentation & Support Files ─────────────────┘
• README.md ...................... Quick start guide
• DATA_QUALITY_ANALYSIS_GUIDE.md . Full methodology
• PACKAGE_INDEX.md ............... File reference
• ADVANCED_LLM_PROMPT.md ......... AI analysis helper
• sample_ldex_life_enrollment.xml. Example data
• LDEx_Data_Quality_Report_Sample.xlsx ... Example output
• generate_sample_report.py ...... Demo generatorThe Three Core Components
1. THE INPUT → sample_ldex_life_enrollment.xml
What it is: Example LDEx enrollment file
Purpose: Shows you what data format the analyzer expects
Links to: The R script reads this (or your real file)
How to use: Replace this with your actual LDEx files
2. THE ANALYZER → ldex_data_quality_analysis.R
What it is: 700+ lines of R code that does all the work
Purpose: Reads XML, runs 30+ checks, generates Excel report
Links to:
Reads: Your LDEx XML file
Outputs: Excel report
References: All documentation files for methodology
How to use: Configure file paths, then run it
3. THE OUTPUT → LDEx_Data_Quality_Report.xlsx
What it is: 10-sheet comprehensive analysis report
Purpose: Shows all findings in business-friendly format
Links to: Generated by the R script
How to use: Open in Excel, review findings, take action
The Supporting Cast
Documentation Trio
README.md (The Quick Start)
Role: Your first stop
Links to: Points you to all other files
Use when: You're getting started or need quick reference
DATA_QUALITY_ANALYSIS_GUIDE.md (The Deep Dive)
Role: Complete methodology explanation
Links to: References the R script code blocks
Use when: You need to understand HOW it works
PACKAGE_INDEX.md (The Reference)
Role: Complete file catalog and navigation
Links to: Everything - it's the master index
Use when: You need to find something specific
Demo Files
LDEx_Data_Quality_Report_Sample.xlsx (The Preview)
Role: Shows you what the output looks like
Links to: This is what the R script produces
Use when: You want to see results WITHOUT running code
generate_sample_report.py (The Generator)
Role: Creates the sample Excel report
Links to: Generates the sample report file
Use when: You want to recreate the demo report
AI Helper
ADVANCED_LLM_PROMPT.md (The AI Guide)
Role: Instructions for using AI to analyze the report
Links to: Designed to work with the Excel output
Use when: You want AI assistance interpreting results
Three Ways to Use This Toolkit
OPTION 1: "Just Show Me" (5 minutes)
For business users who want to see what's possible
1. Open: LDEx_Data_Quality_Report_Sample.xlsx
2. Look at: Sheet 1 (Executive Summary)
3. Notice: Overall score, grade, key findings
4. Done! You've seen what the output looks likeFiles used:
✓ LDEx_Data_Quality_Report_Sample.xlsx
✓ README.md (for interpretation)
OPTION 2: "Run It on My Data" (30 minutes)
For analysts who want to analyze their own LDEx files
Step 1: Install R and packages
→ Follow instructions in README.md
Step 2: Prepare your LDEx file
→ Place your XML file in a known location
→ Example: /data/my_enrollment.xml
Step 3: Configure the R script
→ Open: ldex_data_quality_analysis.R
→ Line 36: Set ldex_file path to YOUR file
→ Line 39: Set output_excel name
Step 4: Run the analysis
→ In R: source("ldex_data_quality_analysis.R")
→ Wait 5 seconds to 5 minutes (depending on file size)
Step 5: Review your report
→ Open: LDEx_Data_Quality_Report.xlsx
→ Start with Executive Summary sheet
→ Investigate any RED (fail) itemsFiles used:
✓ ldex_data_quality_analysis.R (THE MAIN SCRIPT)
✓ Your LDEx XML file
✓ README.md (for setup instructions)
✓ DATA_QUALITY_ANALYSIS_GUIDE.md (for understanding results)
Files produced:
✓ LDEx_Data_Quality_Report.xlsx (YOUR RESULTS)
OPTION 3: "Customize It" (2+ hours)
For developers who want to extend the toolkit
Step 1: Understand the baseline
→ Read: DATA_QUALITY_ANALYSIS_GUIDE.md (full methodology)
→ Study: ldex_data_quality_analysis.R (code structure)
→ Review: sample_ldex_life_enrollment.xml (data format)
Step 2: Run baseline analysis
→ Follow Option 2 steps above
→ Understand what the default analysis does
Step 3: Add your custom rules
→ Reference: DATA_QUALITY_ANALYSIS_GUIDE.md (Customization Guide section)
→ Examples included for:
- New validation rules
- Custom anomalies
- Additional business rules
Step 4: Test your changes
→ Run on sample data first
→ Verify new checks appear in output
Step 5: Deploy to production
→ Document your changes
→ Add to your data pipelineFiles used:
✓ ALL of them! You're going deep
The Information Flow
How Data Moves Through the System
1. YOU provide ────────► LDEx XML file
(Your enrollment data)
│
▼
2. R Script reads ─────► Parses XML using xml2 library
Extracts: Members, Coverages, Dependents
│
▼
3. Analysis runs ──────► Completeness Check
Validation Rules (10+)
Anomaly Detection (8+)
Business Rules (5+)
Statistical Profiling
│
▼
4. Scoring happens ────► Weighted score calculation
Component scores ───► Score (0-100%)
Grade assignment ───► Letter grade (A-D)
│
▼
5. Excel export ───────► 10 sheets created
Color coding applied
Formulas calculated
│
▼
6. YOU receive ────────► LDEx_Data_Quality_Report.xlsx
(Ready to review and act on)File Dependency Map
What Needs What
To RUN the analysis, you need:
ldex_data_quality_analysis.R ← THE ESSENTIAL FILE
Your LDEx XML file
R installed with required packages
To UNDERSTAND the analysis, you need:
README.md (quick start)
DATA_QUALITY_ANALYSIS_GUIDE.md (full details)
To SEE example output, you need:
LDEx_Data_Quality_Report_Sample.xlsx
To NAVIGATE everything, you need:
PACKAGE_INDEX.md
To USE AI for analysis, you need:
ADVANCED_LLM_PROMPT.md
Your actual Excel report
Common Usage Workflows
Workflow 1: First-Time User
1. README.md ..................... Understand what this is
2. LDEx_Data_Quality_Report_Sample.xlsx ... See example output
3. DATA_QUALITY_ANALYSIS_GUIDE.md . Learn the methodology
4. Install R and packages ........ Get environment ready
5. Run on sample_ldex_life_enrollment.xml ... Test it works
6. Run on YOUR data .............. Analyze real filesWorkflow 2: Regular User
1. Place new LDEx file in folder
2. Run: source("ldex_data_quality_analysis.R")
3. Open: LDEx_Data_Quality_Report.xlsx
4. Check: Sheet 4 (Validation Results) for any FAILS
5. Check: Sheet 5 (Anomaly Detection) for HIGH severity
6. Make decisions based on findingsWorkflow 3: Developer Customizing
1. Review: Current validation rules in R script
2. Identify: What additional checks you need
3. Reference: Customization examples in GUIDE
4. Code: Add new rules to R script
5. Test: Run on sample data
6. Document: Update README with your additions
7. Deploy: To production pipelineWorkflow 4: Executive Review
1. Receive: Excel report from analyst
2. Open: Sheet 1 (Executive Summary)
3. Look at: Overall Score and Grade
4. Decide:
- Grade A/B → Approve for loading
- Grade C → Review with team
- Grade D → Reject, send back to sourceKey Connections to Remember
1. R Script ↔ XML File
The R script's first 100 lines are dedicated to:
- Finding your XML file (line 36)
- Parsing it correctly (lines 95-99)
- Extracting data safely (lines 46-54)
CONNECTION: You MUST update line 36 with your file path2. R Script ↔ Excel Output
The R script's last 200 lines create:
- All 10 Excel sheets
- Color formatting
- Data validation rules
- Summary calculations
CONNECTION: Line 39 determines output file name3. Documentation ↔ Code
Every validation rule in the R script is explained in:
- DATA_QUALITY_ANALYSIS_GUIDE.md (lines 52-86)
- With examples of what passes/fails
- With business rationale
CONNECTION: Section numbers match rule IDs4. Sample Files ↔ Real Usage
sample_ldex_life_enrollment.xml shows the structure
Your real file should match this structure
Otherwise, namespace errors occur
CONNECTION: Same XML structure = analysis worksThe "Don't Forget" Checklist
When using this toolkit, remember these connections:
☐ The R script needs the XML file path (line 36)
☐ The R script creates the Excel output (line 39)
☐ The documentation explains the code (reference guide)
☐ The sample output shows expected results (preview)
☐ The README is your quick reference (start here)
☐ The GUIDE has customization examples (for extending)
☐ The sample XML is your format reference (structure guide)
Quick Reference: "Which File Do I Need?"
I want to...Use this file
Get started quickly README.md
See example output LDEx_Data_Quality_Report_Sample.xlsx
Run analysis ldex_data_quality_analysis.R
Understand methodology DATA_QUALITY_ANALYSIS_GUIDE.mdFind a specific feature PACKAGE_INDEX.md
Customize validation rules DATA_QUALITY_ANALYSIS_GUIDE.md + R scriptUse AI for analysis ADVANCED_LLM_PROMPT.md
Understand XML format sample_ldex_life_enrollment.xml
Regenerate sample report generate_sample_report.pyThe Bottom Line
Think of it this way:
Documentation files = Your instruction manual
R script = The engine that does the work
Sample files = Training wheels to learn
Your XML file = The fuel you put in
Excel output = The insights you get out
For more help, reach out to info@dataprepr.ai
.png)


Comments