fix_dex_price_data#

API documentation for tradingstrategy.utils.wrangle.fix_dex_price_data Python function.

fix_dex_price_data(df, freq=None, forward_fill=True, bad_open_close_threshold=3.0, fix_wick_threshold=(0.1, 1.9), fix_inbetween_threshold=(- 0.99, 5.0), min_max_price=(1e-08, 1000000.0), remove_candles_with_zero=True, pair_id_column='pair_id', forward_fill_until=None)[source]#

Wrangle DEX price data for all known issues.

  • Fix broken open/high/low/close value so that they are less likely to cause problems for algorithms

  • Wrangle is a process where we massage incoming price/liquidity data for the isseus we may have encountered during the data collection

  • Common DEX data issues are absurd price high/low spikes due to MEV trades

  • We also have some open/close values that are “broken” in a sense that they do not reflect the market price you would be able to trade, again likely due to MEV

  • Before calling this, you want to call normalise_volume() for OHLCV data

Example:

# After we know pair ids that fill the liquidity criteria,
# we can build OHLCV dataset for these pairs
print(f"Downloading/opening OHLCV dataset {time_bucket}")
price_df = client.fetch_all_candles(time_bucket).to_pandas()
print(f"Filtering out {len(top_liquid_pair_ids)} pairs")
price_df = price_df.loc[price_df.pair_id.isin(top_liquid_pair_ids)]

print("Wrangling DEX price data")
price_df = price_df.set_index("timestamp", drop=False)

# Normalise volume datapoints
price_df = normalise_volume(price_df)

# Conver to grouped data
price_dfgb = price_df.groupby("pair_id")

price_dfgb = fix_dex_price_data(
    price_dfgb,
    freq=time_bucket.to_frequency(),
    forward_fill=True,
)

print(f"Retrofitting OHLCV columns for human readability")
price_df = price_df.obj
price_df["pair_id"] = price_df.index.get_level_values(0)
price_df["ticker"] = price_df.apply(lambda row: make_full_ticker(pair_metadata[row.pair_id]), axis=1)
price_df["link"] = price_df.apply(lambda row: make_link(pair_metadata[row.pair_id]), axis=1)

# Export data, make sure we got columns in an order we want
print(f"Writing OHLCV CSV")
del price_df["timestamp"]
del price_df["pair_id"]
price_df = price_df.reset_index()
column_order = ('ticker', 'timestamp', 'open', 'high', 'low', 'close', 'volume', 'link', 'pair_id',)
price_df = price_df.reindex(columns=column_order)  # Sort columns in a specific order
price_df.to_csv(
  price_output_fname,
)
print(f"Wrote {price_output_fname}, {price_output_fname.stat().st_size:,} bytes")
Parameters:
  • df (pandas.core.frame.DataFrame | pandas.core.groupby.generic.DataFrameGroupBy) –

    Price dataframe with OHLCV data.

    May contain columns named open, close, high, low, volume and timestamp.

    For multipair data this must be DataFrameGroupBy.

  • freq (pandas._libs.tslibs.offsets.DateOffset | str | None) –

    The incoming Pandas frequency of the data, e.g. “d” for daily.

    If the incoming data frequency and freq parameter do not match, the data is resampled o the given frequency.

  • fix_wick_threshold (tuple | None) –

    Apply abnormal high/low wick fix filter.

    Percent value of maximum allowed high/low wick relative to close. By default fix values where low is 90% lower than close and high is 90% higher than close.

    See fix_bad_wicks() for more information.

  • bad_open_close_threshold (float | None) – See fix_bad_wicks().

  • pair_id_column – The pair/reserve id column name in the dataframe.

  • remove_candles_with_zero (bool) –

    Remove candles with zero values for OHLC.

    To deal with abnormal data.

  • min_max_price (tuple | None) –

    Remove candles where open value is outside the floating point range detector.

    See remove_min_max_price().

  • forward_fill (bool) –

    Forward-will gaps in the data.

    Forward-filling data will delete any unknown columns, see tradingstrategy.utils.forward_fill.forward_fill() details.

  • forward_fill_until (Optional[Union[datetime, Timestamp]]) –

    The timestamp which we know the data is valid for.

    If there are price gaps at rarely traded pairs at the end of the (live) OHLCV series, we will forward fill the data until this timestamp.

    If not given forward fills until the last trade of the pair.

  • fix_inbetween_threshold (tuple | None) –

Returns:

Fixed data frame.

If forward fill is used, all other columns outside OHLCV are dropped.

Return type:

DataFrame