Sliding and expanding windows are two powerful tools for analyzing sequential data, especially in time series analysis. While both approaches involve calculating statistics over a subset of data, they differ in how the window is defined and applied:

Sliding Windows: These involve a fixedsize window that moves or “rolls” across the data. Each window contains a fixed number of observations or a fixed time range. As new observations are added, older ones are dropped, keeping the window size constant. This is useful for calculating moving averages, rolling sums, or other statistics where each window is of equal size and provides a localized view of the data.

Expanding Windows: In contrast, expanding windows start at the beginning of the dataset and grow progressively with each new observation. Instead of maintaining a constant window size, the window includes all past observations up to the current point. This makes expanding windows particularly useful for cumulative statistics like running totals, cumulative averages, or cumulative counts, where you want to aggregate all previous data.
Both sliding and expanding windows are essential in different contexts, and choosing the right one depends on what kind of insight you’re trying to derive from the data. Sliding windows are great for keeping the analysis focused on a fixed period, offering a balanced view of recent changes. Expanding windows, on the other hand, give a broader, cumulative picture that grows over time, making them ideal for capturing longterm trends.
Overview of Rolling Windows in Pandas
Function Signature of DataFrame.rolling()
DataFrame.rolling(window, min_periods=None, center=False, win_type=None,
on=None, axis=<no_default>, closed=None, step=None,
method='single')
The key parameters of the function are:
window
(Required)min_periods
center
win_type
on
axis
closed
step
method
Let’s go over these parameters in detail.
1. window
(Required)
This parameter specifies the size of the moving window. It can accept different types of values:
 Fixed number of observations: An integer, such as
window=3
, defines a window of 3 consecutive observations.  Variable window based on time offset: A string, such as
window='5D'
, defines a window based on a time duration (e.g., 5 days).
Key Concept:
This is the heart of the rolling operation, as it determines how many data points are included in each rolling window. When specifying timebased windows, the data must have a DatetimeIndex
.
Example:
df['rolling_mean'] = df['value'].rolling(window=3).mean()
SQL Equivalent:
 Fixed window: SQL supports rolling windows through the
OVER
clause combined withROWS
orRANGE
specifications.  For example, a fixed window of 3 rows can be achieved like this:
SELECT value,
AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_avg
FROM table;
This tells SQL to compute the rolling average over the current row and the two preceding rows.
 Variable window (timebased): SQL can also handle timebased rolling windows using
RANGE
instead ofROWS
. This is particularly useful when dealing with timeseries data.  For example, a rolling window based on a 7day range:
SELECT value,
AVG(value) OVER (ORDER BY date RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS rolling_avg
FROM table;
2. min_periods
 Default:
None
 Description: This specifies the minimum number of observations in the window required to have a valid result. If fewer observations are available, the result is
NaN
.
Key Concept:
This parameter is crucial when dealing with incomplete or missing data. It controls how many nonNaN
values are needed to compute the result for each window.
Example:
# Will only compute mean if at least 2 values are present in the window
df['rolling_mean'] = df['value'].rolling(window=3, min_periods=2).mean()
SQL Equivalent:
SQL does not have a direct min_periods
equivalent, but you can often achieve similar functionality using conditional aggregation. For instance, you can count the number of rows in the window and only calculate the average if the count exceeds a threshold:
SELECT value,
CASE
WHEN COUNT(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) >= 2
THEN AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
ELSE NULL
END AS rolling_avg
FROM table;
Here, the rolling average is only calculated if at least 2 values are present in the window.
3. center
 Default:
False
 Description: This parameter determines whether the window should be centered around the current observation. If
True
, the window is centered (i.e., the observation is in the middle of the window); otherwise, the window is rightaligned.
Key Concept:
Centering can be useful in certain analyses where you want to smooth the data symmetrically around each observation.
Example:
# Center the rolling window
df['rolling_mean'] = df['value'].rolling(window=3, center=True).mean()
SQL Equivalent:
SQL doesn’t have a builtin equivalent of centered windows like Pandas does, but you can simulate the behavior manually by adjusting the window frame to include rows both before and after the current row. In SQL’s OVER
clause, you typically define a moving window using ROWS BETWEEN X PRECEDING AND Y FOLLOWING
. By adjusting both the preceding and following parts, you can create a centered window.
Let’s say you want a centered rolling window over 3 rows where the current row is in the center (1 row before, the current row, and 1 row after). You can specify this using the ROWS BETWEEN
clause.
Here’s an example SQL query that achieves this:
SELECT value,
AVG(value) OVER (
ORDER BY date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS centered_rolling_avg
FROM table;
Explanation:

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
: This defines a window that includes: 1 row before the current row,
 the current row,
 1 row after the current row.
This effectively creates a centered window of size 3, similar to how Pandas’
center=True
works with a rolling window.
4. win_type
 Default:
None
 Description: This specifies the type of window function to use. Possible values include
boxcar
,triang
,blackman
,hamming
,bartlett
, and more. These are weighted windows that apply different weighting schemes to the values within the window.
Key Concept:
Weighted windows are useful when you don’t want to treat all observations equally within the window. For example, in a Hamming window, the values near the center of the window might receive more weight than the values at the edges.
Example:
# Using a window with a Hamming function
df['weighted_rolling_mean'] = df['value'].rolling(window=5, win_type='hamming').mean()
SQL Equivalent:
SQL does not have builtin support for weighted window functions like Pandas. However, you can manually create weighted windows by using custom formulas. For instance, you could assign weights to rows based on their position in the window:
SELECT value,
SUM(value * CASE
WHEN ROW_NUMBER() OVER (ORDER BY date) = 1 THEN 0.5
WHEN ROW_NUMBER() OVER (ORDER BY date) = 2 THEN 0.3
ELSE 0.2
END
) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS weighted_rolling_avg
FROM table;
This is a simple example of manually applying weights, but it’s more cumbersome compared to Pandas’ builtin win_type
.
5. on
 Default:
None
 Description: This specifies the column to use for the rolling window, typically a
Datetime
column. It’s used when the DataFrame does not have aDatetimeIndex
, but you still want to perform rolling operations based on a timebased column.
Key Concept:
If your DataFrame has a separate column for dates or timestamps (and not as the index), this parameter helps you tell Pandas which column to use for timebased rolling windows.
Example:
# Rolling window based on a datetime column
df['rolling_mean'] = df.rolling(window='7D', on='date')['value'].mean()
SQL Equivalent:
In SQL, the ORDER BY
clause within the OVER
clause serves a similar purpose. It specifies the column by which the rolling window is ordered (typically a date column for timeseries data).
SELECT value,
AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_avg
FROM table;
Here, the date
column is used to define the order of the rolling window, similar to how the on
parameter works in Pandas.
6. axis
 Default:
0
(rows)  Description: This specifies the axis over which the rolling window is applied.
0
applies the window over rows, while1
applies the window over columns.
Key Concept:
Typically, the rolling operation is done over rows (axis=0), but it can also be applied across columns with axis=1.
Example:
# Rolling mean across columns instead of rows
df['rolling_mean'] = df.rolling(window=3, axis=1).mean()
SQL Equivalent:
SQL operates in a rowwise manner by default, so there is no direct equivalent to changing the axis in SQL. SQL window functions are always applied across rows, and pivoting or transposing data to operate across columns is not straightforward. You would need to reshape the data using PIVOT
or other methods, which is not as flexible as Pandas’ axis=1
.
7. closed
 Default:
None
 Description: This defines whether the intervals are closed on the right, left, both, or neither. Possible values are
'right'
,'left'
,'both'
, and'neither'
.
Key Concept:
This is particularly useful when working with timebased rolling windows, as it determines whether the current time point is included in the window.
Example:
# Rolling window with intervals closed on both sides
df['rolling_mean'] = df.rolling(window='7D', closed='both', on='date')['value'].mean()
SQL Equivalent:
SQL provides control over whether the window is inclusive or exclusive via the BETWEEN
clause in the ROWS
or RANGE
specification. However, SQL typically defaults to rightclosed (<=
) intervals, and there is no builtin mechanism for fully controlling the window closure as in Pandas.
For example, SQL’s default behavior is usually equivalent to a rightclosed interval:
SELECT value,
AVG(value) OVER (ORDER BY date RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS rolling_avg
FROM table;
To simulate a leftclosed window, you would need to manipulate the ORDER BY
and window logic, but this is not straightforward in SQL.
8. step
 Default:
None
 Description: This defines the step size between windows. For example, a
step=2
would mean applying the rolling function every 2 observations, skipping one in between.
Key Concept:
This parameter allows you to control the frequency at which the rolling operation is applied. It’s useful when you want less granular results.
Example:
# Rolling window that skips every second observation
df['rolling_mean'] = df.rolling(window=3, step=2).mean()
SQL Equivalent:
SQL does not have a builtin equivalent for skipping rows in rolling windows. However, you can simulate this behavior by creating a row number and filtering based on the desired step:
WITH numbered_rows AS (
SELECT value, ROW_NUMBER() OVER (ORDER BY date) AS row_num
FROM table
)
SELECT value,
AVG(value) OVER (ORDER BY row_num ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_avg
FROM numbered_rows
WHERE row_num % 2 = 0;
This query uses the ROW_NUMBER()
function to generate row numbers and then filters based on the step size.
9. method
 Default:
'single'
 Description: This parameter defines whether the operation should be performed with a singlethreaded (
'single'
) or multithreaded ('table'
) approach. The multithreaded option is helpful for large datasets.
Key Concept:
For large datasets, using 'table'
can improve performance by parallelizing the computation.
Example:
# Multithreaded rolling operation
df['rolling_mean'] = df.rolling(window=3, method='table').mean()
SQL Equivalent:
SQL databases handle performance optimization internally, so there is no direct control over whether window functions are computed singlethreaded or multithreaded. The SQL query planner and optimizer will automatically determine the best execution strategy, depending on the database engine (PostgreSQL, MySQL, etc.) and its parallel processing capabilities.
Some databases (e.g., PostgreSQL) offer ways to control parallelism in general, but there is no direct equivalent to Pandas’ method
parameter.
Fixed vs. Variable Windows
An important distinction is between fixed windows (based on a number of observations) and variable windows (based on time offsets):
Fixed Windows
 These are windows that contain a fixed number of observations, regardless of their timestamp or index.
 Example:
rolling(window=3)
will always calculate the rolling statistic over 3 consecutive data points.
Variable Windows (Offsetbased)
 These windows are based on time intervals, which adjust dynamically based on the data’s timestamps.
 Example:
rolling(window='7D')
will include all data points within the last 7 days for each calculation, which may be more or fewer than a fixed number of rows.
Example:
# Fixed window of 3 observations
df['fixed_window_mean'] = df['value'].rolling(window=3).mean()
# Variable window of 7 days
df['variable_window_mean'] = df.rolling(window='7D', on='date')['value'].mean()
Function Signature of DataFrame.expanding()
DataFrame.expanding(min_periods=1, axis=<no_default>, method='single')
This function returns a window object that enables expanding computations (e.g., cumulative sums, means) on the data. We’ll explore the three main parameters of this function: min_periods
, axis
, and method
.
1. min_periods
 Type:
int
 Default:
1
 Description: This parameter specifies the minimum number of observations in the expanding window required to have a valid result. If the number of observations is less than
min_periods
, the result will beNaN
.
Key Concepts:

Cumulative Calculations: In an expanding window, the window starts from the first observation and includes all previous values up to the current point. The
min_periods
parameter ensures that only after a certain number of observations have been included in the window will the result be computed. 
Handling Missing Data: If your data contains missing values (
NaN
), you can usemin_periods
to control when the calculation will start producing nonNaN
results. For instance, settingmin_periods=3
means that the expanding window will only return a result starting with the third row, as there won’t be enough data points before that.
Example:
# Expanding window with a minimum of 3 observations
df['expanding_mean'] = df['value'].expanding(min_periods=3).mean()
In this example, the cumulative mean will only start being calculated after the third row. The first two rows will contain NaN
.
Important Concept:
 When to Use: The
min_periods
parameter is especially important when working with datasets that may not have many observations in the early part of the series. It prevents early calculations when there isn’t enough data, ensuring that the results are statistically meaningful.
in SQL
similar cumulative operations can be achieved using window functions. SQL window functions, particularly cumulative functions like SUM()
, AVG()
, and COUNT()
with the OVER
clause, can replicate the core behavior of expanding windows. However, there are some differences in how SQL handles these operations compared to Pandas.
SQL Equivalent:
SQL does not have a direct equivalent for min_periods
, but you can replicate the behavior using conditional logic (e.g., CASE
statements) to check the number of observations in the window and return NULL
if the count is below a certain threshold.
Example:
Suppose you want to calculate the cumulative sum in SQL but only return a result if there are at least 3 rows (equivalent to min_periods=3
in Pandas).
SELECT value,
CASE
WHEN COUNT(value) OVER (ORDER BY date) >= 3
THEN SUM(value) OVER (ORDER BY date)
ELSE NULL
END AS expanding_sum
FROM table;
In this query:
COUNT(value) OVER (ORDER BY date)
counts the number of rows up to the current row. The
CASE
statement checks if the count is greater than or equal to 3. If true, it calculates the cumulative sum usingSUM() OVER
, otherwise it returnsNULL
.
This mimics the behavior of min_periods
in Pandas.
2. axis
 Type:
int
orstr
 Default:
0
 Description: This parameter specifies the axis along which the expanding window is applied. It can be either:
0
(or'index'
) to apply the expanding operation rowwise (default),1
(or'columns'
) to apply the expanding operation columnwise.
Key Concepts:

Rowwise (axis=0): The expanding operation is applied to each column independently for each row. This is the most common use case, especially for time series, where you want to calculate cumulative statistics over time for each column.

Columnwise (axis=1): The expanding operation is applied across the columns for each row. This can be useful in cases where the columns represent different variables, and you want to calculate cumulative statistics across them.
Example:
# Apply expanding mean rowwise (default)
df['expanding_mean'] = df['value'].expanding(axis=0).mean()
# Apply expanding sum across columns
df['expanding_sum_across_columns'] = df.expanding(axis=1).sum()
Important Concept:
 When to Use: In most timeseries applications, you’ll use
axis=0
because you’re typically interested in cumulative operations over rows (e.g., dates). Usingaxis=1
is less common but can be useful in cases where you have multiple features in columns and want to aggregate across them for each row.
3. method
 Type:
str
 Options:
{'single', 'table'}
 Default:
'single'
 Description: This parameter controls how the expanding operation is executed:
'single'
: Performs the expanding operation over a single column or row at a time.'table'
: Executes the expanding operation over the entire DataFrame or Series in one go, which can be more efficient depending on the operation and the size of the data.
Key Concepts:
 Single vs. Table:
 The
'single'
method applies the expanding window function to each column or row individually. This is the default behavior and is often sufficient for smaller datasets or when working with one column at a time.  The
'table'
method processes the entire DataFrame or Series in one operation, which can be faster for large datasets but may require more memory and computational resources.
 The
Example:
# Apply expanding sum using the single method (default)
df['expanding_sum'] = df['value'].expanding(method='single').sum()
# Apply expanding mean using the table method (useful for large datasets)
df['expanding_mean'] = df.expanding(method='table').mean()
Important Concept:
 When to Use: The
method='table'
option can be more efficient for large datasets, especially when performing operations on multiple columns, but it may not always be necessary for smaller datasets. Use the'table'
method when you’re working with a large DataFrame and need performance optimization.
Running Totals with Additional Conditions
Sometimes, you may need to calculate running totals based on specific conditions, such as filtering certain rows.
SQL Approach
In SQL, incorporating conditions into window functions can be achieved using CASE WHEN
statements within the aggregate function or by applying a WHERE
clause in a subquery.
Example Scenario: Calculate a running total of sales amounts, but only include sales from a specific region.
SELECT
date,
region,
amount,
SUM(CASE WHEN region = 'North' THEN amount ELSE 0 END) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total_north
FROM
sales
ORDER BY
date;
Explanation:
CASE WHEN region = 'North' THEN amount ELSE 0 END
: Ensures that only sales from the ‘North’ region are included in the sum.ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
: Defines the window from the start of the dataset up to the current row, creating a running total.
Pandas Approach
Pandas allows for more straightforward conditional operations using boolean indexing and the cumsum()
function.
Example Scenario: Using the same sales data, calculate a running total for the ‘North’ region.
import pandas as pd
# Sample data
data = {
'date': pd.date_range(start='20230101', periods=10, freq='D'),
'region': ['North', 'South', 'North', 'East', 'North', 'West', 'North', 'South', 'North', 'East'],
'amount': [100, 150, 200, 130, 170, 160, 180, 190, 210, 220]
}
df = pd.DataFrame(data)
# Ensure data is sorted by date
df = df.sort_values('date')
# Calculate running total for 'North' region
df['running_total_north'] = (
df[df['region'] == 'North']
.sort_values('date')['amount']
.cumsum()
)
# Fill NaN for nonNorth regions
df['running_total_north'] = df['running_total_north'].fillna(method='ffill').fillna(0)
print(df)
Output:
date region amount running_total_north
0 20230101 North 100 100.0
1 20230102 South 150 100.0
2 20230103 North 200 300.0
3 20230104 East 130 300.0
4 20230105 North 170 470.0
5 20230106 West 160 470.0
6 20230107 North 180 650.0
7 20230108 South 190 650.0
8 20230109 North 210 860.0
9 20230110 East 220 860.0
Explanation:
df[df['region'] == 'North']
: Filters the DataFrame to include only rows from the ‘North’ region..cumsum()
: Calculates the cumulative sum.fillna(method='ffill').fillna(0)
: FillsNaN
values for nonNorth regions by forward filling and replacing remainingNaN
with 0.