fix_dex_price_data#
API documentation for tradingstrategy.utils.wrangle.fix_dex_price_data Python function.
- fix_dex_price_data(df, freq=None, forward_fill=True, bad_open_close_threshold=3.0, fix_wick_threshold=(0.1, 1.9), fix_inbetween_threshold=(- 0.99, 5.0), min_max_price=(1e-08, 1000000.0), remove_candles_with_zero=True, pair_id_column='pair_id')[source]#
Wrangle DEX price data for all known issues.
Fix broken open/high/low/close value so that they are less likely to cause problems for algorithms
Wrangle is a process where we massage incoming price/liquidity data for the isseus we may have encountered during the data collection
Common DEX data issues are absurd price high/low spikes due to MEV trades
We also have some open/close values that are “broken” in a sense that they do not reflect the market price you would be able to trade, again likely due to MEV
Example:
# After we know pair ids that fill the liquidity criteria, # we can build OHLCV dataset for these pairs print(f"Downloading/opening OHLCV dataset {time_bucket}") price_df = client.fetch_all_candles(time_bucket).to_pandas() print(f"Filtering out {len(top_liquid_pair_ids)} pairs") price_df = price_df.loc[price_df.pair_id.isin(top_liquid_pair_ids)] print("Wrangling DEX price data") price_df = price_df.set_index("timestamp", drop=False).groupby("pair_id") price_df = fix_dex_price_data( price_df, freq=time_bucket.to_frequency(), forward_fill=True, ) print(f"Retrofitting OHLCV columns for human readability") price_df = price_df.obj price_df["pair_id"] = price_df.index.get_level_values(0) price_df["ticker"] = price_df.apply(lambda row: make_full_ticker(pair_metadata[row.pair_id]), axis=1) price_df["link"] = price_df.apply(lambda row: make_link(pair_metadata[row.pair_id]), axis=1) # Export data, make sure we got columns in an order we want print(f"Writing OHLCV CSV") del price_df["timestamp"] del price_df["pair_id"] price_df = price_df.reset_index() column_order = ('ticker', 'timestamp', 'open', 'high', 'low', 'close', 'volume', 'link', 'pair_id',) price_df = price_df.reindex(columns=column_order) # Sort columns in a specific order price_df.to_csv( price_output_fname, ) print(f"Wrote {price_output_fname}, {price_output_fname.stat().st_size:,} bytes")
- Parameters:
df (pandas.core.frame.DataFrame | pandas.core.groupby.generic.DataFrameGroupBy) –
Price dataframe with OHLCV data.
May contain columns named open, close, high, low, volume and timestamp.
For multipair data this must be DataFrameGroupBy.
freq (pandas._libs.tslibs.offsets.DateOffset | str | None) –
The incoming Pandas frequency of the data, e.g. “d” for daily.
If the incoming data frequency and freq parameter do not match, the data is resampled o the given frequency.
fix_wick_threshold (tuple | None) –
Apply abnormal high/low wick fix filter.
Percent value of maximum allowed high/low wick relative to close. By default fix values where low is 90% lower than close and high is 90% higher than close.
See
fix_bad_wicks()
for more information.bad_open_close_threshold (float | None) – See
fix_bad_wicks()
.pair_id_column – The pair/reserve id column name in the dataframe.
remove_candles_with_zero (bool) –
Remove candles with zero values for OHLC.
To deal with abnormal data.
min_max_price (tuple | None) –
Remove candles where open value is outside the floating point range detector.
forward_fill (bool) –
Forward-will gaps in the data.
Forward-filling data will delete any unknown columns, see
tradingstrategy.utils.forward_fill.forward_fill()
details.fix_inbetween_threshold (tuple | None) –
- Returns:
Fixed data frame.
If forward fill is used, all other columns outside OHLCV are dropped.
- Return type:
DataFrame