Root Cause Analysis Pipeline Documentation

Input Files and Configuration

graph TB
    subgraph "Configuration Files"
        M["metrics_config.csv"]
        H["hypothesis_config.csv"]
    end
    
    subgraph "Performance Data"
        T["Territory_Metrics.csv"]
    end
    
    subgraph "External Data Sources"
        P["Product_Priority_and_CLI_Info.csv"]
        C["CLI_per_Account_Manager.csv"]
        I["CI_per_Key_Initiative.csv"]
    end
    
    subgraph "Pipeline Components"
        R["RootCauseAnalysisPipeline"]
        V["RCAVisualizer"]
    end
    
    subgraph "Output Files"
        O1["rca_intermediate_*.csv"]
        O2["rca_results_*.csv"]
        O3["visualizations/*.png"]
    end

    M --> R
    H --> R
    T --> R
    P --> |"external_data.product_priority_cli"| R
    C --> |"external_data.cli_per_am"| R
    I --> |"external_data.ci_key_initiative"| R
    
    R --> V
    R --> O1
    R --> O2
    V --> O3

Pipeline Class Structure

classDiagram
    class RootCauseAnalysisPipeline {
        +DataFrame metrics_config
        +DataFrame hypothesis_config
        +Dict hypotheses
        +Dict external_data
        +RCAVisualizer visualizer
        +Dict territory_mapping
        +Set plotted_hypotheses
        +Set plotted_metrics
        +List intermediate_results
        +__init__(metrics_config_path, hypothesis_config_path, external_data)
        +detect_anomalies(df, metric_column)
        +run_all_metrics(territory_metrics_df)
        -_is_root_cause(hypothesis_data, anomaly)
        -_convert_to_numeric(value)
        -_analyze_product_distribution(df, territory, territory_mapping)
        -_map_territory_to_region(territory)
    }

    class Hypothesis {
        +String name
        +Dict config
        +Dict external_data
        +evaluate(territory, metric_name, higher_is_better, external_data)
    }

    class RCAVisualizer {
        +Dict colors
        +plot_metric_distribution(df, metric_column, anomalies, title)
        +plot_hypothesis_distribution(data, hypothesis_name, metric_name, anomalous_territories)
        +plot_root_cause_analysis(data, hypothesis_name, groupby_col)
        -_save_figure(fig, name)
    }

    RootCauseAnalysisPipeline --> RCAVisualizer : uses
    RootCauseAnalysisPipeline --> Hypothesis : creates

Anomaly Detection Flow

graph TD
    A[Load Data & Config] --> B[Initialize Pipeline]
    B --> C[Detect Anomalies]
    C --> D[Statistical Tests]
    D --> E1{""|Z-Score| > 1.96?""} 
    D --> E2{Significant Dimensional Difference?}
    D --> E3{Historical Pattern Match?}
    E1 --> |Yes| F[Flag as Anomaly]
    E2 --> |Yes| F
    E3 --> |Yes| F
    E1 --> |No| G[Not Anomaly]
    E2 --> |No| G
    E3 --> |No| G

Root Cause Analysis Process

flowchart TD
    A[Start run_all_metrics] --> B[Load Metrics Config]
    B --> C[For Each Metric]
    
    C --> D[Detect Anomalies]
    D --> E[Plot Metric Distribution]
    
    E --> F[For Each Anomaly]
    F --> G[Get Hypothesis]
    G --> H[Evaluate Hypothesis]
    
    H --> I[Calculate Confidence]
    I --> J{Is Root Cause?}
    
    J --> |Yes| K[Add to Final Results]
    J --> |No| L[Add to Intermediate]
    
    K & L --> M[Next Anomaly]
    M --> |More Anomalies| F
    M --> |Done| N[Next Metric]
    
    N --> |More Metrics| C
    N --> |Done| O[Save Results]

Confidence Score Calculation

graph TD
    A[Anomaly Detected] --> B{Calculate Confidence}
    B --> C1[Statistical Significance]
    B --> C2[Dimensional Analysis]
    B --> C3[Historical Patterns]
    C1 & C2 & C3 --> D[Weighted Score]
    D --> E{Score > Threshold?}
    E --> |Yes| F[Investigate Root Cause]
    E --> |No| G[No Further Action]
    F --> H{Positive or Negative?}
    H --> |Positive| I[Best Practice Analysis]
    H --> |Negative| J[Root Cause Analysis]
    I & J --> K[Generate Insights]

Visualization Generation

flowchart TD
    A[Start Visualization] --> B{Plot Type}
    
    B --> C[Metric Distribution]
    B --> D[Hypothesis Distribution]
    B --> E[Root Cause Analysis]
    
    C --> F[Create Bar Plot]
    F --> G[Add Global Mean]
    G --> H[Color by Performance]
    
    D --> I[Check Data Type]
    I --> |Categorical| J[Create Group Plot]
    I --> |Numeric| K[Create Bar Plot]
    J & K --> L[Add Root Cause Markers]
    
    E --> M[Create Region Subplots]
    M --> N[Add Global Reference]
    N --> O[Highlight Anomalies]
    
    H & L & O --> P[Save Figure]
graph TD
    A[Root Cause Investigation] --> B{Check Deviation Type}
    B --> |Positive| C1[Best Practice Check]
    B --> |Negative| C2[Issue Check]
    C1 --> D1{Meets Thresholds?}
    C2 --> D2{Meets Thresholds?}
    D1 --> |Yes| E1[Flag as Best Practice]
    D2 --> |Yes| E2[Flag as Root Cause]
    D1 & D2 --> |No| F[Not Significant]
    E1 --> G[Generate Best Practice Report]
    E2 --> H[Generate Root Cause Report]

Last updated