Model

To understand the factors contributing to the severity of traffic accidents in the United States, I fit a linear regression model using a sample of 2 million accident records from the US Accidents dataset (2016–2023).

The response variable was Severity, which ranges from 1 (least severe) to 4 (most severe). The predictors included:

Start Time: The time of day when the accident began

Distance: The distance the accident covered, in miles

Weather Condition: The reported weather at the time of the accident

Data Generating Mechanism

The fitted equation, simplified to highlight key relationships, is:

\[ \widehat{\text{Severity}} = 3.493 - 7.003 \times 10^{-10} \cdot \text{Start\_Time} + 0.0415 \cdot \text{Distance (mi)} - 0.1003 \cdot \text{Weather\_ConditionClear} - 0.1780 \cdot \text{Weather\_ConditionFog} + 0.0118 \cdot \text{Weather\_ConditionLight\ Snow} + 0.0305 \cdot \text{Weather\_ConditionHeavy\ Rain} \]

The model includes over 100 levels of Weather_Condition, represented as dummy variables. Only a subset is shown here.

Table: Estimated Regression Coefficients with Confidence Intervals and Statistical Significance

Variable Estimate Std. Error p-value Conf. Low Conf. High
(Intercept) 3.493 0.024 <0.001 3.446 3.540
Start_Time -7.0e-10 1.63e-10 <0.001 -1.03e-09 -3.73e-10
Distance(mi) 0.0415 0.003 <0.001 0.035 0.048
Weather_ConditionClear -0.100 0.010 <0.001 -0.119 -0.081
Weather_ConditionFog -0.178 0.033 <0.001 -0.242 -0.114
Weather_ConditionLight Snow 0.012 0.018 0.503 -0.023 0.046
Weather_ConditionHeavy Rain 0.030 0.017 0.075 -0.003 0.063

Only selected variables shown; full model includes 100+ weather conditions as dummies.

This model suggests the following relationships between accident severity and key predictors:

Distance (mi): Accidents that cover longer distances are associated with a notably higher severity, indicating that more extensive incidents tend to be more serious.

Start_Time: The time of day has a very small negative effect on severity. While statistically significant, the magnitude is negligible, suggesting limited practical influence.

Weather Conditions:

Clear and Foggy conditions are associated with lower severity compared to the baseline category (possibly “Overcast” or another default in the model).

Light Snow shows a slight increase in severity, though the effect is not statistically significant.

Heavy Rain has a positive coefficient, indicating increased severity, but the effect is only marginally significant (p ≈ 0.075).

Overall, the model indicates that certain weather conditions and distance covered play more meaningful roles in predicting accident severity, while the time of day has only a minor effect.