Evenly spaced sample

January 02, 2021

While building a trivia application for 2020’s Video Conferenced Christmas, I had need of a function that would provide an evenly spaced sample from a given list.

I considered random.choices, but the results aren’t evenly spaced. I considered numpy.linspace, but the conversion of its floats to integer list indices results in odd edge-cases that can bunch up values.

In need of a holiday puzzle, I sat down to take care of it myself. If you’re interested in the same sort of thing, read only the docstring and try to build something that achieves the same thing yourself!

import random


def evenly_spaced_sample(lst, total):
    """Return evenly spaced out elements of the given list

    ## Deterministic outcomes

    >>> evenly_spaced_sample([0,1, 2, 3,4,5,6, 7, 8,9], 2)
    [2, 7]

    >>> evenly_spaced_sample([0, 1, 2,3, 4, 5], 2)
    [1, 4]

    >>> evenly_spaced_sample([0,1,2], 1)
    [1]

    >>> evenly_spaced_sample(['a', 'b'], 2)
    ['a', 'b']

    >>> evenly_spaced_sample(['a', 'b', 'c'], 2)
    ['a', 'c']


    ## Non-deterministic outcomes

    >>> sorted(set(
    ...     tuple(evenly_spaced_sample([0,1,2,3,4,5,6,7], 4))
    ...     for x in range(100)
    ... ))
    [(0, 2, 4, 6), (1, 3, 5, 7)]

    >>> sorted(set(
    ...     tuple(evenly_spaced_sample([0, 1, 2,3, 4,5, 6,7, 8, 9], 3))
    ...     for x in range(100)
    ... ))
    [(1, 4, 8), (1, 5, 8)]

    >>> sorted(set(
    ...     tuple(evenly_spaced_sample(['a', 'b'], 1))
    ...     for x in range(100)
    ... ))
    [('a',), ('b',)]
    """

    # Handle the edge cases
    len_lst = len(lst)
    if len_lst == total:  # Requesting all items
        return lst
    elif total == 0:  # Requesting no items
        return []
    elif total < 0 or len_lst < total:  # Requesting too many items
        raise ValueError("Sample larger than population or is negative")

    """
        Imagine a list like [0,1,2,3,4,5,6,7,8,9] and we want 3 evenly spaced
        items. The list length (10) divided by the number of items (3) gives
        us 3.3~, so we know our evenly spaced items must be at least 3 apart
        from one another. There's also one more space we need to assign to one
        of the spaces.
    """

    # Determine the space between items and how many gaps of each size we need
    gap_min = int(len_lst / total)
    gap_max = gap_min + 1
    gap_max_count = len_lst - gap_min * total
    gap_min_count = total - gap_max_count

    # Generate a list of offsets, ensuring that the tail has a min gap
    offsets = [gap_min] * (gap_min_count - 1) + [gap_max] * gap_max_count
    random.shuffle(offsets)
    offsets.append(gap_min)

    """
        Next, we need to make sure that we're not returning [0,3,7] as that's
        what you might call "left aligned". We want our evenly spaced sample
        to be somewhat centered. We can do this by shifting our starting index
        over by half of the last offset (considering the head and the tail to
        be one "space" between items, as though the given list were a ring.
    """

    # The starting index is half the last offset, adjusted if it's not an int
    shift = (offsets[-1] - 1) / 2
    index = int(shift)
    if not shift.is_integer() and shift - index < random.random():
        index += 1

    # Compose and return the response list
    response = []
    for offset in offsets:
        response.append(lst[index])
        index += offset
    return response


if __name__ == "__main__":
    import doctest
    doctest.testmod()

Looking around after writing this, I come across some similar problems (e.g. StackOverflow’s Random list of numbers in a range keeping with a minimum distance, but nothing quite like what I’m trying to achieve. I have no doubt someone out there has written something similar, but sometimes it’s difficult to bring these things to the surface.

Worth noting that the more complex the problem, the more time you’d want to spend on investigating prior work instead of just building it from scratch.