fix_dex_price_data#
API documentation for tradingstrategy.utils.wrangle.fix_dex_price_data Python function.
- fix_dex_price_data(df, freq=None, forward_fill=True, bad_open_close_threshold=3.0, fix_wick_threshold=(0.1, 1.9), fix_inbetween_threshold=(- 0.99, 5.0), min_max_price=(1e-08, 1000000.0), remove_candles_with_zero=True, pair_id_column='pair_id', forward_fill_until=None)[source]#
Wrangle DEX price data for all known issues.
Fix broken open/high/low/close value so that they are less likely to cause problems for algorithms
Wrangle is a process where we massage incoming price/liquidity data for the isseus we may have encountered during the data collection
Common DEX data issues are absurd price high/low spikes due to MEV trades
We also have some open/close values that are “broken” in a sense that they do not reflect the market price you would be able to trade, again likely due to MEV
Before calling this, you want to call
normalise_volume()
for OHLCV data
Example:
# After we know pair ids that fill the liquidity criteria, # we can build OHLCV dataset for these pairs print(f"Downloading/opening OHLCV dataset {time_bucket}") price_df = client.fetch_all_candles(time_bucket).to_pandas() print(f"Filtering out {len(top_liquid_pair_ids)} pairs") price_df = price_df.loc[price_df.pair_id.isin(top_liquid_pair_ids)] print("Wrangling DEX price data") price_df = price_df.set_index("timestamp", drop=False) # Normalise volume datapoints price_df = normalise_volume(price_df) # Conver to grouped data price_dfgb = price_df.groupby("pair_id") price_dfgb = fix_dex_price_data( price_dfgb, freq=time_bucket.to_frequency(), forward_fill=True, ) print(f"Retrofitting OHLCV columns for human readability") price_df = price_df.obj price_df["pair_id"] = price_df.index.get_level_values(0) price_df["ticker"] = price_df.apply(lambda row: make_full_ticker(pair_metadata[row.pair_id]), axis=1) price_df["link"] = price_df.apply(lambda row: make_link(pair_metadata[row.pair_id]), axis=1) # Export data, make sure we got columns in an order we want print(f"Writing OHLCV CSV") del price_df["timestamp"] del price_df["pair_id"] price_df = price_df.reset_index() column_order = ('ticker', 'timestamp', 'open', 'high', 'low', 'close', 'volume', 'link', 'pair_id',) price_df = price_df.reindex(columns=column_order) # Sort columns in a specific order price_df.to_csv( price_output_fname, ) print(f"Wrote {price_output_fname}, {price_output_fname.stat().st_size:,} bytes")
- Parameters:
df (pandas.core.frame.DataFrame | pandas.core.groupby.generic.DataFrameGroupBy) –
Price dataframe with OHLCV data.
May contain columns named open, close, high, low, volume and timestamp.
For multipair data this must be DataFrameGroupBy.
freq (pandas._libs.tslibs.offsets.DateOffset | str | None) –
The incoming Pandas frequency of the data, e.g. “d” for daily.
If the incoming data frequency and freq parameter do not match, the data is resampled o the given frequency.
fix_wick_threshold (tuple | None) –
Apply abnormal high/low wick fix filter.
Percent value of maximum allowed high/low wick relative to close. By default fix values where low is 90% lower than close and high is 90% higher than close.
See
fix_bad_wicks()
for more information.bad_open_close_threshold (float | None) – See
fix_bad_wicks()
.pair_id_column – The pair/reserve id column name in the dataframe.
remove_candles_with_zero (bool) –
Remove candles with zero values for OHLC.
To deal with abnormal data.
min_max_price (tuple | None) –
Remove candles where open value is outside the floating point range detector.
forward_fill (bool) –
Forward-will gaps in the data.
Forward-filling data will delete any unknown columns, see
tradingstrategy.utils.forward_fill.forward_fill()
details.forward_fill_until (Optional[Union[datetime, Timestamp]]) –
The timestamp which we know the data is valid for.
If there are price gaps at rarely traded pairs at the end of the (live) OHLCV series, we will forward fill the data until this timestamp.
If not given forward fills until the last trade of the pair.
fix_inbetween_threshold (tuple | None) –
- Returns:
Fixed data frame.
If forward fill is used, all other columns outside OHLCV are dropped.
- Return type:
DataFrame