Root Cause Analysis Pipeline Documentation
Input Files and Configuration
graph TB
subgraph "Configuration Files"
M["metrics_config.csv"]
H["hypothesis_config.csv"]
end
subgraph "Performance Data"
T["Territory_Metrics.csv"]
end
subgraph "External Data Sources"
P["Product_Priority_and_CLI_Info.csv"]
C["CLI_per_Account_Manager.csv"]
I["CI_per_Key_Initiative.csv"]
end
subgraph "Pipeline Components"
R["RootCauseAnalysisPipeline"]
V["RCAVisualizer"]
end
subgraph "Output Files"
O1["rca_intermediate_*.csv"]
O2["rca_results_*.csv"]
O3["visualizations/*.png"]
end
M --> R
H --> R
T --> R
P --> |"external_data.product_priority_cli"| R
C --> |"external_data.cli_per_am"| R
I --> |"external_data.ci_key_initiative"| R
R --> V
R --> O1
R --> O2
V --> O3
Pipeline Class Structure
classDiagram
class RootCauseAnalysisPipeline {
+DataFrame metrics_config
+DataFrame hypothesis_config
+Dict hypotheses
+Dict external_data
+RCAVisualizer visualizer
+Dict territory_mapping
+Set plotted_hypotheses
+Set plotted_metrics
+List intermediate_results
+__init__(metrics_config_path, hypothesis_config_path, external_data)
+detect_anomalies(df, metric_column)
+run_all_metrics(territory_metrics_df)
-_is_root_cause(hypothesis_data, anomaly)
-_convert_to_numeric(value)
-_analyze_product_distribution(df, territory, territory_mapping)
-_map_territory_to_region(territory)
}
class Hypothesis {
+String name
+Dict config
+Dict external_data
+evaluate(territory, metric_name, higher_is_better, external_data)
}
class RCAVisualizer {
+Dict colors
+plot_metric_distribution(df, metric_column, anomalies, title)
+plot_hypothesis_distribution(data, hypothesis_name, metric_name, anomalous_territories)
+plot_root_cause_analysis(data, hypothesis_name, groupby_col)
-_save_figure(fig, name)
}
RootCauseAnalysisPipeline --> RCAVisualizer : uses
RootCauseAnalysisPipeline --> Hypothesis : creates
Anomaly Detection Flow
graph TD
A[Load Data & Config] --> B[Initialize Pipeline]
B --> C[Detect Anomalies]
C --> D[Statistical Tests]
D --> E1{""|Z-Score| > 1.96?""}
D --> E2{Significant Dimensional Difference?}
D --> E3{Historical Pattern Match?}
E1 --> |Yes| F[Flag as Anomaly]
E2 --> |Yes| F
E3 --> |Yes| F
E1 --> |No| G[Not Anomaly]
E2 --> |No| G
E3 --> |No| G
Root Cause Analysis Process
flowchart TD
A[Start run_all_metrics] --> B[Load Metrics Config]
B --> C[For Each Metric]
C --> D[Detect Anomalies]
D --> E[Plot Metric Distribution]
E --> F[For Each Anomaly]
F --> G[Get Hypothesis]
G --> H[Evaluate Hypothesis]
H --> I[Calculate Confidence]
I --> J{Is Root Cause?}
J --> |Yes| K[Add to Final Results]
J --> |No| L[Add to Intermediate]
K & L --> M[Next Anomaly]
M --> |More Anomalies| F
M --> |Done| N[Next Metric]
N --> |More Metrics| C
N --> |Done| O[Save Results]
Confidence Score Calculation
graph TD
A[Anomaly Detected] --> B{Calculate Confidence}
B --> C1[Statistical Significance]
B --> C2[Dimensional Analysis]
B --> C3[Historical Patterns]
C1 & C2 & C3 --> D[Weighted Score]
D --> E{Score > Threshold?}
E --> |Yes| F[Investigate Root Cause]
E --> |No| G[No Further Action]
F --> H{Positive or Negative?}
H --> |Positive| I[Best Practice Analysis]
H --> |Negative| J[Root Cause Analysis]
I & J --> K[Generate Insights]
Visualization Generation
flowchart TD
A[Start Visualization] --> B{Plot Type}
B --> C[Metric Distribution]
B --> D[Hypothesis Distribution]
B --> E[Root Cause Analysis]
C --> F[Create Bar Plot]
F --> G[Add Global Mean]
G --> H[Color by Performance]
D --> I[Check Data Type]
I --> |Categorical| J[Create Group Plot]
I --> |Numeric| K[Create Bar Plot]
J & K --> L[Add Root Cause Markers]
E --> M[Create Region Subplots]
M --> N[Add Global Reference]
N --> O[Highlight Anomalies]
H & L & O --> P[Save Figure]
graph TD
A[Root Cause Investigation] --> B{Check Deviation Type}
B --> |Positive| C1[Best Practice Check]
B --> |Negative| C2[Issue Check]
C1 --> D1{Meets Thresholds?}
C2 --> D2{Meets Thresholds?}
D1 --> |Yes| E1[Flag as Best Practice]
D2 --> |Yes| E2[Flag as Root Cause]
D1 & D2 --> |No| F[Not Significant]
E1 --> G[Generate Best Practice Report]
E2 --> H[Generate Root Cause Report]
Last updated