Model
To understand the factors contributing to the severity of traffic accidents in the United States, I fit a linear regression model using a sample of 2 million accident records from the US Accidents dataset (2016–2023).
The response variable was Severity, which ranges from 1 (least severe) to 4 (most severe). The predictors included:
Start Time: The time of day when the accident began
Distance: The distance the accident covered, in miles
Weather Condition: The reported weather at the time of the accident
Data Generating Mechanism
The fitted equation, simplified to highlight key relationships, is:
\[ \widehat{\text{Severity}} = 3.493 - 7.003 \times 10^{-10} \cdot \text{Start\_Time} + 0.0415 \cdot \text{Distance (mi)} - 0.1003 \cdot \text{Weather\_ConditionClear} - 0.1780 \cdot \text{Weather\_ConditionFog} + 0.0118 \cdot \text{Weather\_ConditionLight\ Snow} + 0.0305 \cdot \text{Weather\_ConditionHeavy\ Rain} \]
The model includes over 100 levels of Weather_Condition
, represented as dummy variables. Only a subset is shown here.
Table: Estimated Regression Coefficients with Confidence Intervals and Statistical Significance
Variable | Estimate | Std. Error | p-value | Conf. Low | Conf. High |
---|---|---|---|---|---|
(Intercept) | 3.493 | 0.024 | <0.001 | 3.446 | 3.540 |
Start_Time | -7.0e-10 | 1.63e-10 | <0.001 | -1.03e-09 | -3.73e-10 |
Distance(mi) | 0.0415 | 0.003 | <0.001 | 0.035 | 0.048 |
Weather_ConditionClear | -0.100 | 0.010 | <0.001 | -0.119 | -0.081 |
Weather_ConditionFog | -0.178 | 0.033 | <0.001 | -0.242 | -0.114 |
Weather_ConditionLight Snow | 0.012 | 0.018 | 0.503 | -0.023 | 0.046 |
Weather_ConditionHeavy Rain | 0.030 | 0.017 | 0.075 | -0.003 | 0.063 |
Only selected variables shown; full model includes 100+ weather conditions as dummies.
This model suggests the following relationships between accident severity and key predictors:
Distance (mi): Accidents that cover longer distances are associated with a notably higher severity, indicating that more extensive incidents tend to be more serious.
Start_Time: The time of day has a very small negative effect on severity. While statistically significant, the magnitude is negligible, suggesting limited practical influence.
Weather Conditions:
Clear and Foggy conditions are associated with lower severity compared to the baseline category (possibly “Overcast” or another default in the model).
Light Snow shows a slight increase in severity, though the effect is not statistically significant.
Heavy Rain has a positive coefficient, indicating increased severity, but the effect is only marginally significant (p ≈ 0.075).
Overall, the model indicates that certain weather conditions and distance covered play more meaningful roles in predicting accident severity, while the time of day has only a minor effect.