Skip to main content

Study notes: MLops Week 3-2 Forecasting

·325 words·2 mins
Table of Contents

Week 3-2 of the AWS MLops: AWS Forecast and time-series data

Forecasting and AWS Forecast
#

Overview
#

  • Predicting future values that are based on historical data
  • Patterns include
    • Trends
    • Seasonal, pattern that is based on seasons
    • Cyclical, other repeating patterns
    • Irregular, patterns that might appear to be random
  • Examples
    • Sales and demand forecast
    • Energy consumption
    • Inventory projections
    • Weather forecast

Processing time series data
#

  • Time series data is captured in sequence over time

  • Handle missing data

    • Forward fill
    • Backward fill
    • Moving average
    • Interpolation: linear, spline, or polynomial
    • Sometimes zero is a good fill value
  • Reasmpling: Resampling time series data allows the flexibility of defining resolution of the data

    • Upsampling: increase the sample frequency, e.g. from minutes to seconds. Care must be taken in deciding how the fine-grained samples are computed.
    • Downsampling: decrease the sample frequency, e.g. from days to months. Need to pay attention to how the aggregation is carried out.
    • Reasons for resampling:
      • Inspect the behavior of data under different resolutions
      • Join tables with different resolutions
  • Sampling smoothing, including outlier removal

    • Why
      • Part of the data preparation process
      • For visualization
    • How does smoothing affect the outcome
      • Cleaner data to model
      • Model compatibility
      • Production improvement?
  • Seasonality

    • Hourly, daily, quarterly, yearly
    • Spring, summer, fall, winter
    • Holidays
  • Time series sample correlations

    • Stationary
      • How stable is the system
      • Does the past inform the future
    • Trends
      • Correlation issues
    • Autocorrelation
      • How points in time series sample are linearly related
  • pandas offer many methods for handling time series data

    • Time-aware index
    • groupby and resample()
    • autocorr() method
  • Times series algorithms offered by Amazon Forecast

    • ARIMA, autoregressive integrated moving average
    • DeepAR+
    • Exponential Smoothing (ETS)
    • Non-Parametric Time Series (NPTS)
    • Prophet
  • Model evaluation

    • Time series data model training cannot use $k$-fold cross validation because the data is ordered and correlated.
    • Standard approach: back testing
    Wrapper method
    </figure>
    
    • Two metrics can be used to access the backtesting (hindcasting instead of forecasting) accuracy
      • wQuantileLoss: the average error for each quantile in a set
      • RMSE, root mean square error