PairGroupedUniverse#

tradingstrategy.utils.groupeduniverse.PairGroupedUniverse Python class in Trading Strategy framework.

class PairGroupedUniverse[source]#

Bases: object

A base class for manipulating columnar price/liquidity data by a pair.

The server streams the data for all pairs in a single continuous time-indexed format. For most the use cases, we want to look up and manipulate data by pairs. To achieve this, we use Pandas pd.GroupBy and recompile the data on the client side.

This works for

  • OHLCV candles

  • Liquidity candles

The input pd.DataFrame is sorted by default using timestamp column and then made this column as an index. This is not optimised (not inplace).

See also

__init__(df, time_bucket=TimeBucket.d1, timestamp_column='timestamp', index_automatically=True, fix_wick_threshold=(0.1, 1.9))[source]#
Parameters:
  • time_bucket – What bar size candles we are operating at. Default to daily. TODO: Currently not used. Will be removed in the future versions.

  • timestamp_column – What column use to build a time index. Used for QStrader / Backtrader compatibility.

  • index_automatically – Convert the index to use time series. You might avoid this with QSTrader kind of data.

  • fix_wick_threshold (tuple | None) –

    Apply abnormal high/low wick fix filter.

    Percent value of maximum allowed high/low wick relative to close. By default fix values where low is 90% lower than close and high is 90% higher than close.

    See tradingstrategy.utils.groupeduniverse.fix_bad_wicks() for more information.

  • df (DataFrame) –

Methods

__init__(df[, time_bucket, ...])

param time_bucket:

clear_cache()

Clear candles cached by pair.

forward_fill([columns, drop_other_columns])

Forward-fill sparse OHLCV candle data.

get_all_pairs()

Go through all liquidity samples, one DataFrame per trading pair.

get_all_samples_by_range(start, end)

Get list of candles/samples for all pairs at a certain range.

get_all_samples_by_timestamp(ts)

Get list of candles/samples for all pairs at a certain timepoint.

get_columns()

Get column names from the underlying pandas.GroupBy object

get_last_entries_by_pair_and_timestamp(...)

Get samples for a single pair before a timestamp.

get_pair_count()

Return the number of pairs in this dataset

get_pair_ids()

Get all pairs present in the dataset

get_prior_timestamp(ts)

Get the first timestamp in the index that is before the given timestamp.

get_sample_count()

Return the dataset size - how many samples total for all pairs

get_samples_by_pair(pair_id)

Get samples for a single pair.

get_single_pair_data([timestamp, ...])

Get all candles/liquidity samples for the single alone pair in the universe by a certain timestamp.

get_timestamp_range([use_timezone])

Return the time range of data we have for.

iterate_samples_by_pair_range(start, end)

Get list of candles/samples for all pairs at a certain range.

__init__(df, time_bucket=TimeBucket.d1, timestamp_column='timestamp', index_automatically=True, fix_wick_threshold=(0.1, 1.9))[source]#
Parameters:
  • time_bucket – What bar size candles we are operating at. Default to daily. TODO: Currently not used. Will be removed in the future versions.

  • timestamp_column – What column use to build a time index. Used for QStrader / Backtrader compatibility.

  • index_automatically – Convert the index to use time series. You might avoid this with QSTrader kind of data.

  • fix_wick_threshold (tuple | None) –

    Apply abnormal high/low wick fix filter.

    Percent value of maximum allowed high/low wick relative to close. By default fix values where low is 90% lower than close and high is 90% higher than close.

    See tradingstrategy.utils.groupeduniverse.fix_bad_wicks() for more information.

  • df (DataFrame) –

clear_cache()[source]#

Clear candles cached by pair.

get_columns()[source]#

Get column names from the underlying pandas.GroupBy object

Return type:

Index

get_sample_count()[source]#

Return the dataset size - how many samples total for all pairs

Return type:

int

get_pair_count()[source]#

Return the number of pairs in this dataset

Return type:

int

get_samples_by_pair(pair_id)[source]#

Get samples for a single pair.

After the samples have been extracted, set timestamp as the index for the data.

Returns:

Data frame group

Raises:

KeyError – If we do not have data for pair_id

Parameters:

pair_id (int) –

Return type:

DataFrame

get_last_entries_by_pair_and_timestamp(pair_id, timestamp, small_time=Timedelta('0 days 00:00:01'))[source]#

Get samples for a single pair before a timestamp.

Return a DataFrame slice containing all datapoints before the timestamp.

Parameters:
  • pair_id (int) – Integer id for a trading pair

  • timestamp (Timestamp) – Get all samples excluding this timestamp.

Returns:

Dataframe that contains samples for a single trading pair.

Indexed by timestamp.

Raises:

KeyError – If we do not have data for pair_id

Return type:

DataFrame

get_all_pairs()[source]#

Go through all liquidity samples, one DataFrame per trading pair.

Return type:

Iterable[Tuple[int, DataFrame]]

get_pair_ids()[source]#

Get all pairs present in the dataset

Return type:

Iterable[int]

get_all_samples_by_timestamp(ts)[source]#

Get list of candles/samples for all pairs at a certain timepoint.

Raises:

KeyError – The universe does not contain a sample for a given timepoint

Returns:

A DataFrame that contains candles/samples at the specific timeout

Parameters:

ts (Timestamp) –

Return type:

DataFrame

get_all_samples_by_range(start, end)[source]#

Get list of candles/samples for all pairs at a certain range.

Useful to get the last few samples for multiple pairs.

Example:

# Set up timestamps for 3 weeks range, one week in middle
end = Timestamp('2021-10-25 00:00:00')
start = Timestamp('2021-10-11 00:00:00')
middle = start + (end - start) / 2

# Get weekly candles
raw_candles = client.fetch_all_candles(TimeBucket.d7).to_pandas()
candle_universe = GroupedCandleUniverse(raw_candles)
candles = candle_universe.get_all_samples_by_range(start, end)

# We have pair data for 3 different weeks
assert len(candles.index.unique()) == 3

# Each week has its of candles broken down by a pair
# and can be unique addressed by their pair_id
assert len(candles.loc[start]) >= 1000
assert len(candles.loc[middle]) >= 1000
assert len(candles.loc[end]) >= 1000
Parameters:
  • start (Timestamp) – start of the range (inclusive)

  • end (Timestamp) – end of the range (inclusive)

Returns:

A DataFrame that contains candles/samples for all pairs at the range.

Return type:

DataFrame

iterate_samples_by_pair_range(start, end)[source]#

Get list of candles/samples for all pairs at a certain range.

Useful to get the last few samples for multiple pairs.

Example:

raw_candles = client.fetch_all_candles(TimeBucket.d7).to_pandas()
candle_universe = GroupedCandleUniverse(raw_candles)

# Calibrate our week
random_date = pd.Timestamp("2021-10-29")
end = candle_universe.get_prior_timestamp(random_date)
assert end == pd.Timestamp("2021-10-25")

# Because we ar using weekly candles,
# and start and end are inclusive endpoints,
# we should get 3 weeks of samples
start = pd.Timestamp(end) - pd.Timedelta(weeks=2)

for pair_id, pair_df in candle_universe.iterate_samples_by_pair_range(start, end):
    # Because of missing samples, some pairs may have different ranges.
    # In this example, we iterate 3 weeks ranges, so we can have
    # 1, 2 or 3 weekly candles.
    # If there was no data at all pair_id is not present in the result.
    range_start = pair_df.index[0]
    range_end = pair_df.index[-1]
    assert range_start <= range_end
    # Calculate the momentum for the full range of all samples
    first_candle = pair_df.iloc[0]
    last_candle = pair_df.iloc[-1]
    # Calculate
    momentum = (last_candle["close"] - first_candle["open"]) / first_candle["open"] - 1
Parameters:
  • start (Timestamp) – start of the range (inclusive)

  • end (Timestamp) – end of the range (inclusive)

Returns:

DataFrame.groupby result

Return type:

DataFrame

get_timestamp_range(use_timezone=False)[source]#

Return the time range of data we have for.

Parameters:

use_timezone – The resulting timestamps will have their timezone set to UTC. If not set then naive timestamps are generated.

Returns:

(start timestamp, end timestamp) tuple, UTC-timezone aware If the data frame is empty, return None, None.

Return type:

Tuple[Optional[Timestamp], Optional[Timestamp]]

get_prior_timestamp(ts)[source]#

Get the first timestamp in the index that is before the given timestamp.

This allows us to calibrate weekly/4 hours/etc. indexes to any given time..

Example:

raw_candles = client.fetch_all_candles(TimeBucket.d7).to_pandas()
candle_universe = GroupedCandleUniverse(raw_candles)

# Calibrate our week
random_date = pd.Timestamp("2021-10-29")
weekly_ts_before = candle_universe.get_prior_timestamp(random_date)

assert weekly_ts_before == pd.Timestamp("2021-10-25")
Returns:

Any timestamp from the index that is before or at the same time of the given timestamp.

Parameters:

ts (Timestamp) –

Return type:

Timestamp

get_single_pair_data(timestamp=None, sample_count=None, allow_current=False, raise_on_not_enough_data=True, time_range_epsilon_seconds=0.5)[source]#

Get all candles/liquidity samples for the single alone pair in the universe by a certain timestamp.

A shortcut method for trading strategies that trade only one pair. Designed to be backtesting and live trading friendly function to access candle data.

Example:

Note

By default get_single_pair_da ta() returns the candles prior to the timestamp, the behavior can be changed with get_single_pair_data(allow_current=True). At the start of the backtest, we do not have any previous candle available yet, so this function may raise NoDataAvailable.

Parameters:
  • timestamp (Optional[Timestamp]) – Get the sample until this timestamp and all previous samples.

  • allow_current

    Allow to read any candle precisely at the timestamp. If you read the candle of your current strategy cycle timestamp, bad things may happen.

    In backtesting, reading the candle at the current timestamp introduces forward-looking bias. In live trading, reading the candle at the current timestamp may give you no candle or an incomplete candle (trades are still piling up on it).

  • sample_count (Optional[int]) –

    Minimum candle/liquidity sample count needed.

    Limit the returned number of candles N candles before the timestamp.

    If the data does not have enough samples before timestamp, then raise NoDataAvailable.

  • raise_on_not_enough_data

    Raise an error if no data is available.

    This can be e.g. because the trading pair has

  • time_range_epsilon_seconds – The time delta epsilon we use to determine between “current” and “previous” candle.

Raises:

NoDataAvailable

Raised when there is no data available at the range.

Set fail_on_empty=False to return an empty DataFrame instead.

Return type:

DataFrame

forward_fill(columns=('open', 'close'), drop_other_columns=True)[source]#

Forward-fill sparse OHLCV candle data.

Forward fills the missing candle values for non-existing candles. Trading Strategy data does not have candles unless there was actual trades happening at the markets.

See tradingstrategy.utils.forward_fill for details.

Note

Does not touch the original self.df DataFrame any way. Only self.pairs is modified with forward-filled data.

Parameters:
  • columns (Tuple[str]) –

    Columns to fill.

    To save memory and speed, only fill the columns you need. Usually open and close are enough and also filled by default.

  • drop_other_columns

    Remove other columns before forward-fill to save memory.

    The resulting DataFrame will only have columns listed in columns parameter.

    The removed columns include ones like high and low, but also Trading Strategy specific columns like start_block and end_block. It’s unlikely we are going to need forward-filled data in these columns.