forward_fill#

API documentation for tradingstrategy.utils.forward_fill.forward_fill Python function.

forward_fill(single_or_multipair_data, freq, columns=('open', 'high', 'low', 'close', 'volume', 'timestamp'), drop_other_columns=True, forward_fill_until=None)[source]#

Forward-fill OHLCV data for multiple trading pairs.

Forward fill certain candle columns.

If multiple pairs are given as a GroupBy, then the data is filled only for the min(pair_timestamp), max(timestamp) - not for the range of the all data.

Note

timestamp and pair_id columns will be deleted in this process
  • do not use these columns, but corresponding indexes instead.

See also

Example:

import os

from tradingstrategy.chain import ChainId
from tradingstrategy.client import Client
from tradingstrategy.timebucket import TimeBucket
from tradingstrategy.utils.forward_fill import forward_fill
from tradingstrategy.utils.groupeduniverse import fix_bad_wicks

from tradeexecutor.strategy.execution_context import python_script_execution_context
from tradeexecutor.strategy.trading_strategy_universe import load_all_data
from tradeexecutor.strategy.universe_model import UniverseOptions

client = Client.create_jupyter_client()

chain_id = ChainId.polygon
time_bucket = TimeBucket.d1
exchange_slug = "uniswap-v3"

exchanges = client.fetch_exchange_universe()
uni = exchanges.get_by_chain_and_slug(ChainId.polygon, exchange_slug)

dataset = load_all_data(
    client,
    time_frame=TimeBucket.d1,
    execution_context=python_script_execution_context,
    universe_options=UniverseOptions(),
    with_liquidity=False,
)

# Filter out pair ids that belong to our target dataset
pair_universe = dataset.pairs
pair_ids = pair_universe.loc[pair_universe["exchange_id"] == uni.exchange_id]["pair_id"]
filtered_df = dataset.candles.loc[dataset.candles["pair_id"].isin(pair_ids)]

# Forward fill data
filtered_df = filtered_df.set_index("timestamp")

# Sanitise price data
filtered_df = fix_bad_wicks(filtered_df)

# Make sure there are no gaps in the data
filtered_df = filtered_df.groupby("pair_id")
pairs_df = forward_fill(
    filtered_df,
    freq=time_bucket.to_frequency(),
    columns=("open", "high", "low", "close", "volume"),
)

# Wrote Parquest file under /tmp
fpath = f"/tmp/{chain_id.get_slug()}-{exchange_slug}-candles-{time_bucket.value}.parquet"
flattened_df = pairs_df.obj
flattened_df = flattened_df.reset_index().set_index("timestamp")  # Get rid of grouping
flattened_df.to_parquet(fpath)
print(f"Wrote {fpath} {os.path.getsize(fpath):,} bytes")
Parameters:
  • single_or_multipair_data (pandas.core.frame.DataFrame | pandas.core.groupby.generic.DataFrameGroupBy) –

    Candle data for single or multiple trading pairs

    • GroupBy DataFrame containing candle data for multiple trading pairs (grouped by column pair_id).

    • Normal DataFrame containing candle data for a single pair

  • freq (pandas._libs.tslibs.offsets.DateOffset | str) – The target frequency for the DataFrame.

  • columns (Collection[str]) –

    Columns to fill.

    To save memory and speed, only fill the columns you need. Usually open and close are enough and also filled by default.

    To get all OHLC data set this to (“open”, “high”, “low”, “close”).

    If the data has timestamp column we fill it with the first value.

  • drop_other_columns

    Remove other columns before forward-fill to save memory.

    The resulting DataFrame will only have columns listed in columns parameter.

    The removed columns include ones like high and low, but also Trading Strategy specific columns like start_block and end_block. It’s unlikely we are going to need forward-filled data in these columns.

    Note

    We have no logic for forward filling random columns, only mentioned columns.

  • forward_fill_until (pandas._libs.tslibs.timestamps.Timestamp | None) –

    The timestamp which we know the data is valid for.

    If there are price gaps at rarely traded pairs at the end of the (live) OHLCV series, we will forward fill the data until this timestamp.

    If not given forward fills until the last trade of the pair.

    The timestamp must match the index timestamp frequency .

Returns:

DataFrame where each timestamp has a value set for columns.

For multi pair data if input is DataFrameGroupBy then a similar DataFrameGroupBy is returned.

Return type:

DataFrame