fix_dex_price_data#

API documentation for tradingstrategy.utils.wrangle.fix_dex_price_data Python function.

fix_dex_price_data(df, freq=None, forward_fill=True, bad_open_close_threshold=3.0, fix_wick_threshold=(0.1, 1.9), fix_inbetween_threshold=(- 0.99, 5.0), min_max_price=(1e-08, 1000000.0), remove_candles_with_zero=True, pair_id_column='pair_id')[source]#

Wrangle DEX price data for all known issues.

  • Fix broken open/high/low/close value so that they are less likely to cause problems for algorithms

  • Wrangle is a process where we massage incoming price/liquidity data for the isseus we may have encountered during the data collection

  • Common DEX data issues are absurd price high/low spikes due to MEV trades

  • We also have some open/close values that are “broken” in a sense that they do not reflect the market price you would be able to trade, again likely due to MEV

Example:

# After we know pair ids that fill the liquidity criteria,
# we can build OHLCV dataset for these pairs
print(f"Downloading/opening OHLCV dataset {time_bucket}")
price_df = client.fetch_all_candles(time_bucket).to_pandas()
print(f"Filtering out {len(top_liquid_pair_ids)} pairs")
price_df = price_df.loc[price_df.pair_id.isin(top_liquid_pair_ids)]

print("Wrangling DEX price data")
price_df = price_df.set_index("timestamp", drop=False).groupby("pair_id")
price_df = fix_dex_price_data(
    price_df,
    freq=time_bucket.to_frequency(),
    forward_fill=True,
)

print(f"Retrofitting OHLCV columns for human readability")
price_df = price_df.obj
price_df["pair_id"] = price_df.index.get_level_values(0)
price_df["ticker"] = price_df.apply(lambda row: make_full_ticker(pair_metadata[row.pair_id]), axis=1)
price_df["link"] = price_df.apply(lambda row: make_link(pair_metadata[row.pair_id]), axis=1)

# Export data, make sure we got columns in an order we want
print(f"Writing OHLCV CSV")
del price_df["timestamp"]
del price_df["pair_id"]
price_df = price_df.reset_index()
column_order = ('ticker', 'timestamp', 'open', 'high', 'low', 'close', 'volume', 'link', 'pair_id',)
price_df = price_df.reindex(columns=column_order)  # Sort columns in a specific order
price_df.to_csv(
  price_output_fname,
)
print(f"Wrote {price_output_fname}, {price_output_fname.stat().st_size:,} bytes")
Parameters:
  • df (pandas.core.frame.DataFrame | pandas.core.groupby.generic.DataFrameGroupBy) –

    Price dataframe with OHLCV data.

    May contain columns named open, close, high, low, volume and timestamp.

    For multipair data this must be DataFrameGroupBy.

  • freq (pandas._libs.tslibs.offsets.DateOffset | str | None) –

    The incoming Pandas frequency of the data, e.g. “d” for daily.

    If the incoming data frequency and freq parameter do not match, the data is resampled o the given frequency.

  • fix_wick_threshold (tuple | None) –

    Apply abnormal high/low wick fix filter.

    Percent value of maximum allowed high/low wick relative to close. By default fix values where low is 90% lower than close and high is 90% higher than close.

    See fix_bad_wicks() for more information.

  • bad_open_close_threshold (float | None) – See fix_bad_wicks().

  • pair_id_column – The pair/reserve id column name in the dataframe.

  • remove_candles_with_zero (bool) –

    Remove candles with zero values for OHLC.

    To deal with abnormal data.

  • min_max_price (tuple | None) –

    Remove candles where open value is outside the floating point range detector.

    See remove_min_max_price().

  • forward_fill (bool) –

    Forward-will gaps in the data.

    Forward-filling data will delete any unknown columns, see tradingstrategy.utils.forward_fill.forward_fill() details.

  • fix_inbetween_threshold (tuple | None) –

Returns:

Fixed data frame.

If forward fill is used, all other columns outside OHLCV are dropped.

Return type:

DataFrame