PairGroupedUniverse#

API documentation for tradingstrategy.utils.groupeduniverse.PairGroupedUniverse Python class in Trading Strategy framework.

class PairGroupedUniverse[source]#

Bases: object

A base class for manipulating columnar price/liquidity data by a pair.

The server streams the data for all pairs in a single continuous time-indexed format. For most the use cases, we want to look up and manipulate data by pairs. To achieve this, we use Pandas pd.GroupBy and recompile the data on the client side.

This works for

OHLCV candles
Liquidity candles
Lending reserves (one PairGroupedUniverse per each metric like supply APR and borrow APR)

The input pd.DataFrame is sorted by default using timestamp column and then made this column as an index. This is not optimised (not inplace).

See also

__init__(df, time_bucket=TimeBucket.d1, timestamp_column='timestamp', index_automatically=True, fix_wick_threshold=(0.1, 1.9), fix_inbetween_threshold=(- 0.99, 5.0), primary_key_column='pair_id', remove_candles_with_zero_volume=True, forward_fill=False, bad_open_close_threshold=3.0, autoheal_pair_limit=1500, forward_fill_until=None, min_max_price=(1e-08, 1000000.0))[source]#

Set up new candle universe where data is grouped by trading pair.

Parameters:

df (DataFrame) – DataFrame backing the data.
time_bucket (TimeBucket) –
What bar size candles we are operating at. Default to daily.

TODO: Currently not used. Will be removed in the future versions.
timestamp_column (str) – What column use to build a time index. Used for QStrader / Backtrader compatibility.
index_automatically (bool) – Convert the index to use time series. You might avoid this with QSTrader kind of data.
fix_wick_threshold (tuple | None) –
Apply abnormal high/low wick fix filter.

Percent value of maximum allowed high/low wick relative to close. By default fix values where low is 90% lower than close and high is 90% higher than close.

See fix_bad_wicks() for more information.
bad_open_close_threshold (float | None) – See fix_bad_wicks().
primary_key_column (str) – The pair/reserve id column name in the dataframe.
remove_zero_candles –
Remove candles with zero values for OHLC.

To deal with abnormal data.
forward_fill (bool) –
Forward-will gaps in the data.

See forward fill and forward filling data for more information.
autoheal_pair_limit –
Don’t try to autoheal data if the candle universe is too large.

Autohealing is very taxing operation and should not be performed on large universes. Instead you should preprocess the universe to a candles Parquet file and load directly from there.
autoheal_limit – If we have more than
fix_inbetween_threshold (tuple | None) –
remove_candles_with_zero_volume (bool) –
forward_fill_until (datetime.datetime | pandas._libs.tslibs.timestamps.Timestamp | None) –

Methods

`__init__`(df[, time_bucket, ...])	Set up new candle universe where data is grouped by trading pair.
`clear_cache`()	Clear candles cached by pair.
`create_from_multiple_candle_dataframes`(dfs)	Construct universe based on multiple trading pairs.
`create_from_single_pair_dataframe`(df[, bucket])	Construct universe based on a single trading pair data.
`forward_fill`([columns, drop_other_columns])	Forward-fill sparse OHLCV candle data.
`get_all_pairs`([max_count])	Go through all liquidity samples, one DataFrame per trading pair.
`get_all_samples_by_range`(start, end)	Get list of candles/samples for all pairs at a certain range.
`get_all_samples_by_timestamp`(ts)	Get list of candles/samples for all pairs at a certain timepoint.
`get_columns`()	Get column names from the underlying pandas.GroupBy object
`get_last_entries_by_pair_and_timestamp`(pair, ...)	Get samples for a single pair before a timestamp.
`get_pair_count`()	Return the number of pairs in this dataset.
`get_pair_ids`()	Get all pairs present in the dataset
`get_prior_timestamp`(ts)	Get the first timestamp in the index that is before the given timestamp.
`get_sample_count`()	Return the dataset size - how many samples total for all pairs
`get_samples_by_pair`(pair_id)	Get samples for a single pair.
`get_single_pair_data`([timestamp, ...])	Get all candles/liquidity samples for the single alone pair in the universe by a certain timestamp.
`get_single_value`(asset_id, when, ...[, ...])	Get a single value for a single pair/asset at a specific point of time.
`get_timestamp_range`([use_timezone, ...])	Return the time range of data we have for.
`is_forward_filled`()	Check if the data was forward filled after the data loading.
`iterate_samples_by_pair_range`(start, end)	Get list of candles/samples for all pairs at a certain range.

Attributes

candles_cache

Grouped DataFrame cache for faster lookup

Set up new candle universe where data is grouped by trading pair.

Parameters:

df (DataFrame) – DataFrame backing the data.
time_bucket (TimeBucket) –
What bar size candles we are operating at. Default to daily.

TODO: Currently not used. Will be removed in the future versions.
timestamp_column (str) – What column use to build a time index. Used for QStrader / Backtrader compatibility.
index_automatically (bool) – Convert the index to use time series. You might avoid this with QSTrader kind of data.
fix_wick_threshold (tuple | None) –
Apply abnormal high/low wick fix filter.

Percent value of maximum allowed high/low wick relative to close. By default fix values where low is 90% lower than close and high is 90% higher than close.

See fix_bad_wicks() for more information.
bad_open_close_threshold (float | None) – See fix_bad_wicks().
primary_key_column (str) – The pair/reserve id column name in the dataframe.
remove_zero_candles –
Remove candles with zero values for OHLC.

To deal with abnormal data.
forward_fill (bool) –
Forward-will gaps in the data.

See forward fill and forward filling data for more information.
autoheal_pair_limit –
Don’t try to autoheal data if the candle universe is too large.

Autohealing is very taxing operation and should not be performed on large universes. Instead you should preprocess the universe to a candles Parquet file and load directly from there.
autoheal_limit – If we have more than
fix_inbetween_threshold (tuple | None) –
remove_candles_with_zero_volume (bool) –
forward_fill_until (datetime.datetime | pandas._libs.tslibs.timestamps.Timestamp | None) –

candles_cache: dict[int, pandas.core.frame.DataFrame]#: Grouped DataFrame cache for faster lookup

is_forward_filled()[source]#

Check if the data was forward filled after the data loading.

Returns:: True if the data is forward filled, False otherwise.
Return type:: bool

clear_cache()[source]#: Clear candles cached by pair.

get_columns()[source]#

Get column names from the underlying pandas.GroupBy object

Return type:: Index

get_sample_count()[source]#

Return the dataset size - how many samples total for all pairs

Return type:: int

get_pair_count()[source]#

Return the number of pairs in this dataset.

TODO: Rename. Also used by lending reserves, and this then refers to count of reserves, not pairs.

Return type:: int

get_samples_by_pair(pair_id)[source]#

Get samples for a single pair.

After the samples have been extracted, set timestamp as the index for the data.

Returns:: Data frame group
Raises:: KeyError – If we do not have data for pair_id
Parameters:: pair_id (int) –
Return type:: DataFrame

get_last_entries_by_pair_and_timestamp(pair, timestamp, small_time=Timedelta('0 days 00:00:01'))[source]#

Get samples for a single pair before a timestamp.

Return a DataFrame slice containing all datapoints before the timestamp.

We assume timestamp is current decision frame. E.g. for daily close data return the previous day close to prevent any lookahead bias.

Parameters:

pair_id – Integer id for a trading pair
timestamp (pandas._libs.tslibs.timestamps.Timestamp | datetime.datetime) – Get all samples excluding this timestamp.
pair (tradingstrategy.pair.DEXPair | int) –

Returns:

Dataframe that contains samples for a single trading pair.

Indexed by timestamp.

Raises:

KeyError – If we do not have data for pair_id

Return type:

DataFrame

get_all_pairs(max_count=None)[source]#

Go through all liquidity samples, one DataFrame per trading pair.

Parameters:: max_count – Randomly sample N pairs
Return type:: Iterable[Tuple[int, DataFrame]]

get_pair_ids()[source]#

Get all pairs present in the dataset

Return type:: Iterable[int]

get_all_samples_by_timestamp(ts)[source]#

Get list of candles/samples for all pairs at a certain timepoint.

Raises:: KeyError – The universe does not contain a sample for a given timepoint
Returns:: A DataFrame that contains candles/samples at the specific timeout
Parameters:: ts (Timestamp) –
Return type:: DataFrame

get_all_samples_by_range(start, end)[source]#

Get list of candles/samples for all pairs at a certain range.

Useful to get the last few samples for multiple pairs.

Example:

# Set up timestamps for 3 weeks range, one week in middle
end = Timestamp('2021-10-25 00:00:00')
start = Timestamp('2021-10-11 00:00:00')
middle = start + (end - start) / 2

# Get weekly candles
raw_candles = client.fetch_all_candles(TimeBucket.d7).to_pandas()
candle_universe = GroupedCandleUniverse(raw_candles)
candles = candle_universe.get_all_samples_by_range(start, end)

# We have pair data for 3 different weeks
assert len(candles.index.unique()) == 3

# Each week has its of candles broken down by a pair
# and can be unique addressed by their pair_id
assert len(candles.loc[start]) >= 1000
assert len(candles.loc[middle]) >= 1000
assert len(candles.loc[end]) >= 1000

Parameters:

start (Timestamp) – start of the range (inclusive)
end (Timestamp) – end of the range (inclusive)

Returns:

A DataFrame that contains candles/samples for all pairs at the range.

Return type:

DataFrame

iterate_samples_by_pair_range(start, end)[source]#

Get list of candles/samples for all pairs at a certain range.

Useful to get the last few samples for multiple pairs.

Example:

raw_candles = client.fetch_all_candles(TimeBucket.d7).to_pandas()
candle_universe = GroupedCandleUniverse(raw_candles)

# Calibrate our week
random_date = pd.Timestamp("2021-10-29")
end = candle_universe.get_prior_timestamp(random_date)
assert end == pd.Timestamp("2021-10-25")

# Because we ar using weekly candles,
# and start and end are inclusive endpoints,
# we should get 3 weeks of samples
start = pd.Timestamp(end) - pd.Timedelta(weeks=2)

for pair_id, pair_df in candle_universe.iterate_samples_by_pair_range(start, end):
    # Because of missing samples, some pairs may have different ranges.
    # In this example, we iterate 3 weeks ranges, so we can have
    # 1, 2 or 3 weekly candles.
    # If there was no data at all pair_id is not present in the result.
    range_start = pair_df.index[0]
    range_end = pair_df.index[-1]
    assert range_start <= range_end
    # Calculate the momentum for the full range of all samples
    first_candle = pair_df.iloc[0]
    last_candle = pair_df.iloc[-1]
    # Calculate
    momentum = (last_candle["close"] - first_candle["open"]) / first_candle["open"] - 1

Parameters:

start (Timestamp) – start of the range (inclusive)
end (Timestamp) – end of the range (inclusive)

Returns:

DataFrame.groupby result

Return type:

DataFrame

get_timestamp_range(use_timezone=False, exclude_forward_fill=False)[source]#

Return the time range of data we have for.

Note

Because we assume multipair data, the data is grouped by and not indexed as time series. Thus, this function can be a slow operation.

Parameters:

use_timezone –

The resulting timestamps will have their timezone set to UTC. If not set then naive timestamps are generated.

Legacy option. Do not use.

Returns:

(start timestamp, end timestamp) tuple, UTC-timezone aware

If the data frame is empty, return None, None.

Return type:

Tuple[Optional[Timestamp], Optional[Timestamp]]

get_prior_timestamp(ts)[source]#

Get the first timestamp in the index that is before the given timestamp.

This allows us to calibrate weekly/4 hours/etc. indexes to any given time..

Example:

raw_candles = client.fetch_all_candles(TimeBucket.d7).to_pandas()
candle_universe = GroupedCandleUniverse(raw_candles)

# Calibrate our week
random_date = pd.Timestamp("2021-10-29")
weekly_ts_before = candle_universe.get_prior_timestamp(random_date)

assert weekly_ts_before == pd.Timestamp("2021-10-25")

Returns:: Any timestamp from the index that is before or at the same time of the given timestamp.
Parameters:: ts (Timestamp) –
Return type:: Timestamp

get_single_pair_data(timestamp=None, sample_count=None, allow_current=False, raise_on_not_enough_data=True, time_range_epsilon_seconds=0.5)[source]#

Get all candles/liquidity samples for the single alone pair in the universe by a certain timestamp.

A shortcut method for trading strategies that trade only one pair. Designed to be backtesting and live trading friendly function to access candle data.

Example:

Note

By default get_single_pair_da ta() returns the candles prior to the timestamp, the behavior can be changed with get_single_pair_data(allow_current=True). At the start of the backtest, we do not have any previous candle available yet, so this function may raise NoDataAvailable.

Parameters:

timestamp (Optional[Timestamp]) – Get the sample until this timestamp and all previous samples.
allow_current –
Allow to read any candle precisely at the timestamp. If you read the candle of your current strategy cycle timestamp, bad things may happen.

In backtesting, reading the candle at the current timestamp introduces forward-looking bias. In live trading, reading the candle at the current timestamp may give you no candle or an incomplete candle (trades are still piling up on it).
sample_count (Optional[int]) –
Minimum candle/liquidity sample count needed.

Limit the returned number of candles N candles before the timestamp.

If the data does not have enough samples before timestamp, then raise NoDataAvailable.
raise_on_not_enough_data –
Raise an error if no data is available.

This can be e.g. because the trading pair has
time_range_epsilon_seconds – The time delta epsilon we use to determine between “current” and “previous” candle.

Raises:

NoDataAvailable –

Raised when there is no data available at the range.

Set fail_on_empty=False to return an empty DataFrame instead.

Return type:

DataFrame

get_single_value(asset_id, when, data_lag_tolerance, kind='close', asset_name=None, link=None)[source]#

Get a single value for a single pair/asset at a specific point of time.

The data may be sparse data. There might not be sample available in the same time point or immediate previous time point. In this case the method looks back for the previous data point within tolerance time range.

This method should be relative fast and optimised for any price, volume and liquidity queries.

Example:

# TODO

Parameters:

asset_id (int) – Trading pair id
when (pandas._libs.tslibs.timestamps.Timestamp | datetime.datetime) – Timestamp to query
kind – One of OHLC data points: “open”, “close”, “low”, “high”
tolerance – If there is no liquidity sample available at the exact timepoint, look to the past to the get the nearest sample. For example if candle time interval is 5 minutes and look_back_timeframes is 10, then accept a candle that is maximum of 50 minutes before the timepoint.
asset_name (str | None) –
Used in exception messages.

If not given use asset_id.
link (str | None) –
Link to the asset page.

Used in exception messages.

If not given use <link unavailable>.
data_lag_tolerance (Timedelta) –

Returns:

Return (value, delay) tuple.

We always return a value. In the error cases an exception is raised. The delay is the timedelta between the wanted timestamp and the actual timestamp of the sampled value.

Candles are always timestamped by their opening.

Raises:

NoDataAvailable – There were no samples available with the given condition.

Return type:

Tuple[float, Timedelta]

forward_fill(columns=('open', 'close'), drop_other_columns=True)[source]#

Forward-fill sparse OHLCV candle data.

Forward fills the missing candle values for non-existing candles. Trading Strategy data does not have candles unless there was actual trades happening at the markets.

See tradingstrategy.utils.forward_fill for details.

Note

Does not touch the original self.df DataFrame any way. Only self.pairs is modified with forward-filled data.

Parameters:

columns (Tuple[str]) –
Columns to fill.

To save memory and speed, only fill the columns you need. Usually open and close are enough and also filled by default.
drop_other_columns –
Remove other columns before forward-fill to save memory.

The resulting DataFrame will only have columns listed in columns parameter.

The removed columns include ones like high and low, but also Trading Strategy specific columns like start_block and end_block. It’s unlikely we are going to need forward-filled data in these columns.

classmethod create_from_single_pair_dataframe(df, bucket=None)[source]#

Construct universe based on a single trading pair data.

Useful for synthetic data/testing.

Parameters:

df (DataFrame) –
bucket (tradingstrategy.timebucket.TimeBucket | None) –

Return type:

PairGroupedUniverse

classmethod create_from_multiple_candle_dataframes(dfs, autoheal_pair_limit=200)[source]#

Construct universe based on multiple trading pairs.

Useful for synthetic data/testing.

Parameters:: dfs (Iterable[DataFrame]) – List of dataframes/series where each trading pair is as isolated OHLCV data feed.
Return type:: GroupedCandleUniverse