Cell Balancing using a Genetic Algorithm¶

Problem Statement¶

Our study has a global "Audience" (demographic) set that needs to be fulfilled.
We have a limited number of Highlighters.
In our study, we have Cells (pairing of instructions <-> audience)
It's possible for the audiences of each Cell to overlap.
We therefore have a constraint satisfaction problem. Given a finite list of Highlighters, how do we distribute those Highlighters to Cells so that we maximize our distribution and minimize our chance of not being able to fill one of the Cells?

Defining Classes and Configuration:¶

In [13]:

import random 

class Highlighter:
    def __init__(self, name):
        self.name = name
        self.assignable_buckets = []

    def __str__(self):
        return f"Highlighter: {self.name}, assignable_buckets: {self.assignable_buckets}"

# Buckets are what we need to fill.
class Bucket:
    def __init__(self, name, percent_fillable, fill_goal):
        self.name = name
        # How many of the available Highlighters can fit into this bucket
        self.percent_fillable = percent_fillable
        self.highlighters = []
        # How many Highlighters are needed to fill that bucket
        self.fill_goal = fill_goal

    def score(self):
        return 1

# Things to configure
buckets = [
    Bucket("A", 20, 10),
    Bucket("B", 20, 20),
    Bucket("C", 60, 30),
    Bucket("D", 30, 30),
    Bucket("E", 10, 10),
]

NUM_HIGHLIGHTERS = 100

highlighters = [Highlighter(f"H_{i:03d}") for i in range(NUM_HIGHLIGHTERS)]

# Configure randomly which Highlighters can be assigned to which buckets
for bucket in buckets:
    fillable = bucket.percent_fillable / 100.0
    num_highlighters_to_assign = int(NUM_HIGHLIGHTERS * fillable)
    
    hlers_to_add_bucket_to = random.sample(highlighters, num_highlighters_to_assign)
    for h in hlers_to_add_bucket_to:
        h.assignable_buckets.append(bucket.name)

for h in highlighters:
    print(h)

# filter out highlighters that have no possible assignment
highlighters = [h for h in highlighters if len(h.assignable_buckets) > 0]

Highlighter: H_000, assignable_buckets: []
Highlighter: H_001, assignable_buckets: ['C']
Highlighter: H_002, assignable_buckets: ['B', 'C', 'E']
Highlighter: H_003, assignable_buckets: ['D']
Highlighter: H_004, assignable_buckets: ['B', 'C']
Highlighter: H_005, assignable_buckets: ['B', 'C']
Highlighter: H_006, assignable_buckets: []
Highlighter: H_007, assignable_buckets: ['A', 'B']
Highlighter: H_008, assignable_buckets: ['B', 'C']
Highlighter: H_009, assignable_buckets: ['B', 'C']
Highlighter: H_010, assignable_buckets: ['C']
Highlighter: H_011, assignable_buckets: ['C', 'D']
Highlighter: H_012, assignable_buckets: ['C']
Highlighter: H_013, assignable_buckets: ['C', 'D']
Highlighter: H_014, assignable_buckets: []
Highlighter: H_015, assignable_buckets: ['C', 'D']
Highlighter: H_016, assignable_buckets: ['A', 'C', 'D', 'E']
Highlighter: H_017, assignable_buckets: ['C']
Highlighter: H_018, assignable_buckets: ['C', 'D']
Highlighter: H_019, assignable_buckets: ['A', 'D']
Highlighter: H_020, assignable_buckets: []
Highlighter: H_021, assignable_buckets: ['C', 'D']
Highlighter: H_022, assignable_buckets: []
Highlighter: H_023, assignable_buckets: ['C', 'D', 'E']
Highlighter: H_024, assignable_buckets: ['C']
Highlighter: H_025, assignable_buckets: ['B', 'C', 'D']
Highlighter: H_026, assignable_buckets: ['D']
Highlighter: H_027, assignable_buckets: ['A']
Highlighter: H_028, assignable_buckets: ['A', 'C']
Highlighter: H_029, assignable_buckets: ['C']
Highlighter: H_030, assignable_buckets: ['D', 'E']
Highlighter: H_031, assignable_buckets: ['B', 'C']
Highlighter: H_032, assignable_buckets: ['A', 'B']
Highlighter: H_033, assignable_buckets: ['B', 'C']
Highlighter: H_034, assignable_buckets: []
Highlighter: H_035, assignable_buckets: ['A']
Highlighter: H_036, assignable_buckets: []
Highlighter: H_037, assignable_buckets: ['D']
Highlighter: H_038, assignable_buckets: ['C']
Highlighter: H_039, assignable_buckets: ['C']
Highlighter: H_040, assignable_buckets: ['A', 'D']
Highlighter: H_041, assignable_buckets: ['B']
Highlighter: H_042, assignable_buckets: ['C']
Highlighter: H_043, assignable_buckets: []
Highlighter: H_044, assignable_buckets: []
Highlighter: H_045, assignable_buckets: ['B', 'C']
Highlighter: H_046, assignable_buckets: []
Highlighter: H_047, assignable_buckets: []
Highlighter: H_048, assignable_buckets: ['C']
Highlighter: H_049, assignable_buckets: ['C']
Highlighter: H_050, assignable_buckets: ['E']
Highlighter: H_051, assignable_buckets: ['A', 'C', 'E']
Highlighter: H_052, assignable_buckets: ['C']
Highlighter: H_053, assignable_buckets: ['A', 'C']
Highlighter: H_054, assignable_buckets: ['C', 'D']
Highlighter: H_055, assignable_buckets: ['B', 'C', 'D']
Highlighter: H_056, assignable_buckets: ['B']
Highlighter: H_057, assignable_buckets: ['C']
Highlighter: H_058, assignable_buckets: ['C', 'E']
Highlighter: H_059, assignable_buckets: ['C', 'D']
Highlighter: H_060, assignable_buckets: ['A', 'B', 'C']
Highlighter: H_061, assignable_buckets: []
Highlighter: H_062, assignable_buckets: ['A', 'B']
Highlighter: H_063, assignable_buckets: ['D']
Highlighter: H_064, assignable_buckets: ['C']
Highlighter: H_065, assignable_buckets: ['C']
Highlighter: H_066, assignable_buckets: ['C']
Highlighter: H_067, assignable_buckets: ['A', 'E']
Highlighter: H_068, assignable_buckets: ['C']
Highlighter: H_069, assignable_buckets: ['C']
Highlighter: H_070, assignable_buckets: ['B', 'C']
Highlighter: H_071, assignable_buckets: ['A', 'C', 'E']
Highlighter: H_072, assignable_buckets: ['A', 'C']
Highlighter: H_073, assignable_buckets: ['A', 'C', 'D']
Highlighter: H_074, assignable_buckets: ['C', 'D']
Highlighter: H_075, assignable_buckets: ['C', 'E']
Highlighter: H_076, assignable_buckets: ['A']
Highlighter: H_077, assignable_buckets: ['B']
Highlighter: H_078, assignable_buckets: ['C', 'D']
Highlighter: H_079, assignable_buckets: ['A', 'C', 'D']
Highlighter: H_080, assignable_buckets: ['C', 'D']
Highlighter: H_081, assignable_buckets: ['C']
Highlighter: H_082, assignable_buckets: ['D']
Highlighter: H_083, assignable_buckets: ['C', 'D']
Highlighter: H_084, assignable_buckets: []
Highlighter: H_085, assignable_buckets: ['A', 'C', 'D']
Highlighter: H_086, assignable_buckets: ['C', 'D']
Highlighter: H_087, assignable_buckets: ['C']
Highlighter: H_088, assignable_buckets: ['C']
Highlighter: H_089, assignable_buckets: ['C']
Highlighter: H_090, assignable_buckets: ['B', 'C', 'D']
Highlighter: H_091, assignable_buckets: ['C']
Highlighter: H_092, assignable_buckets: []
Highlighter: H_093, assignable_buckets: ['D']
Highlighter: H_094, assignable_buckets: ['A', 'B']
Highlighter: H_095, assignable_buckets: []
Highlighter: H_096, assignable_buckets: []
Highlighter: H_097, assignable_buckets: []
Highlighter: H_098, assignable_buckets: []
Highlighter: H_099, assignable_buckets: ['D']

Visualizing the Buckets¶

In [14]:

import networkx as nx
import matplotlib.pyplot as plt

# Create a graph
G = nx.Graph()

# Add highlighters as nodes
for highlighter in highlighters:
    G.add_node(highlighter.name, type='highlighter')

bucket_colors = {}
for bucket in buckets:
    color = (random.random(), random.random(), random.random())  # Random RGB color
    bucket_colors[bucket.name] = color
    G.add_node(bucket.name, type='bucket', color=color)
    for highlighter in highlighters:
        if bucket.name in highlighter.assignable_buckets:
            G.add_edge(highlighter.name, bucket.name)

# Draw the graph
# pos = nx.spring_layout(G)
# pos = nx.circular_layout(G)
# pos = nx.spectral_layout(G)
pos = nx.kamada_kawai_layout(G)

node_types = nx.get_node_attributes(G, 'type')
node_sizes = [500 if node_types[node] == 'bucket' else 10 for node in G.nodes]  # Larger size for buckets

node_colors = [bucket_colors[node] if node_types[node] == 'bucket' else 'blue' for node in G.nodes]

nx.draw(G, pos, with_labels=False, node_color=node_colors, node_size=node_sizes)
plt.show()

No description has been provided for this image

Defining a Fitness Function¶

Our fitness function needs to somewhat accurately define what a "good" assignment is.

"Good" means:

High percentage of filled buckets

In [15]:

import statistics 

# Fitness function that just optimizes for
# total percentage of assigned Highlighters
def fitness_1(buckets, assignment):
    '''
        buckets: []Bucket
        assignments: dict {Highlighter: Bucket_Name}

    '''

    assigned_buckets = list(assignment.values())

    fill_ratios = []
    for bucket in buckets:
        num_filled = sum(1 for assigned_bucket in assigned_buckets if assigned_bucket == bucket.name)
        fill_ratios.append(min(num_filled / bucket.fill_goal, 1.0))

    return statistics.mean(fill_ratios)

# Same as fitness_1 except it penalizes for overfilling buckets
def fitness_2(buckets, assignment):
    '''
        buckets: []Bucket
        assignments: dict {Highlighter: Bucket_Name}
    '''

    assigned_buckets = list(assignment.values())

    fill_ratios = []
    for bucket in buckets:
        num_filled = sum(1 for assigned_bucket in assigned_buckets if assigned_bucket == bucket.name)
        fill_ratio = num_filled / bucket.fill_goal
        if fill_ratio > 1.0:
            # Apply penalty for fill ratios greater than 1
            penalty = 1.0 - (fill_ratio - 1.0)
            fill_ratios.append(penalty)
        else:
            fill_ratios.append(fill_ratio)

    return statistics.mean(fill_ratios)



def fitness_3(buckets, assignments):
    """
    Calculate the fitness score based on the total number of assigned highlighters divided by the sum of bucket fill goals.
    
    :param buckets: List of Bucket objects
    :param assignments: Dictionary mapping Highlighter objects to bucket names
    :return: Fitness score
    """
    total_assigned_highlighters = len(assignments)
    total_fill_goals = sum(bucket.fill_goal for bucket in buckets)
    
    # Avoid division by zero
    if total_fill_goals == 0:
        return 0.0
    
    # Calculate fitness score
    fitness_score = total_assigned_highlighters / total_fill_goals
    
    return fitness_score

Assignment Generation¶

Random¶

In [16]:

import random
def generate_random_assignment(buckets, highlighters):
    '''
        Each Highlighter can only be assigned to 1 Bucket.
    '''
    assignment = {}
    for h in highlighters:
        # Randomly choose one of the highlighter's assignable buckets
        chosen_bucket = random.choice(h.assignable_buckets)
        assignment[h] = chosen_bucket
    return assignment

assignment = generate_random_assignment(buckets, highlighters)

Weighted Random Assignment - Prioritizes Fill Needs¶

In [17]:

import random

def generate_weighted_assignment(buckets, highlighters):
    '''
        Generate assignments based on Bucket needs with some degree of randomness.
        Each Highlighter can only be assigned to one of its assignable buckets.
    '''
    assignment = {}
    
    # Dictionary to keep track of total filled per bucket
    filled_per_bucket = {bucket.name: 0 for bucket in buckets}
    
    # Calculate total filled per bucket
    for h in highlighters:
        if h in assignment:
            filled_per_bucket[assignment[h]] += 1

    random.shuffle(highlighters)
    for h in highlighters:
        # Filter buckets to only include those that are assignable to the highlighter
        assignable_buckets = [bucket for bucket in buckets if bucket.name in h.assignable_buckets]

        # If there are no assignable buckets, skip assignment for this highlighter
        if not assignable_buckets:
            continue
        
        # Calculate remaining needs ratios for each bucket
        remaining_ratios = [(bucket.fill_goal - filled_per_bucket[bucket.name]) / bucket.fill_goal for bucket in assignable_buckets]
        
        # Calculate weights based on remaining needs ratios
        weights = remaining_ratios
        
        # Calculate cumulative weights
        cumulative_weights = [sum(weights[:i+1]) for i in range(len(weights))]
        
        # Choose a random number within the total weight
        rand_num = random.uniform(0, sum(weights))
        
        # Find the corresponding bucket based on the random number
        chosen_bucket = None
        for i, cum_weight in enumerate(cumulative_weights):
            if rand_num <= cum_weight:
                chosen_bucket = assignable_buckets[i]
                break
        if chosen_bucket:
            assignment[h] = chosen_bucket.name
        
            # Update total filled per bucket
            filled_per_bucket[chosen_bucket.name] += 1
        else:
            assignment[h] = ''
    return assignment

assignment = generate_weighted_assignment(buckets, highlighters)

Visualize the Assignment¶

In [18]:

import matplotlib.pyplot as plt

def generate_bar_graphs(buckets, assignment):
    # Initialize lists to store fill goals and total filled for each bucket
    fill_goals = []
    total_filled = []

    # Iterate over each bucket to gather fill goals and calculate total filled
    for bucket in buckets:
        fill_goals.append(bucket.fill_goal)
        total_filled.append(sum(1 for assigned_bucket in assignment.values() if assigned_bucket == bucket.name))

    # Create bar graph
    plt.figure(figsize=(10, 6))
    plt.bar(range(len(buckets)), total_filled, color='blue', label='Total Filled')
    plt.plot(fill_goals, color='red', linestyle='--', marker='o', label='Fill Goal')
    
    # Set labels and title
    plt.xlabel('Bucket')
    plt.ylabel('Count')
    plt.title('Bucket Fill Status')
    plt.xticks(range(len(buckets)), [bucket.name for bucket in buckets])
    plt.legend()
    plt.tight_layout()

    # Show plot
    plt.show()

generate_bar_graphs(buckets, assignment)

Determine the Fitness of this Assignment¶

In [19]:

fitness_1(buckets, assignment)

Out[19]:

0.7666666666666666

In [20]:

## Average Fitness of Random Assignment

scores = []
for i in range(1000):
    assignment = generate_random_assignment(buckets, highlighters)
    scores.append(fitness_1(buckets, assignment))

print(statistics.mean(scores))

0.70015

Genetic Algorithm¶

In [21]:

class GeneticAlgorithmRunner:
    def __init__(self, buckets, highlighters, mutation_rate=0.01, crossover_rate=0.1, cull_rate=0.1, max_pop=1000, max_epochs=50, fitness_function = fitness_1, generator_algorithm=generate_random_assignment):
        self.buckets = buckets
        self.highlighters = highlighters
        self.assignments = []
        self.cull_rate = cull_rate
        self.mutation_rate = mutation_rate
        self.crossover_rate = crossover_rate
        self.max_epochs = max_epochs
        self.max_population = max_pop
        self.fitness_function = fitness_function
        self.generator_algorithm = generator_algorithm

        self.population = []
                # Generate N random assignments
        for i in range(self.max_population):
            self.population.append(generator_algorithm(self.buckets, self.highlighters))

            # Select some % of fitnesses based on weights
            # selected_indices = random.choices(range(len(fitnesses)), weights=weights, k=int(self.cull_rate * len(fitnesses)))

    # Selects the top candidates 
    def cull(self, fitnesses, population):
        # Determine the number of individuals to select
        num_to_select = int(len(fitnesses) * self.cull_rate)
        
        # Create a list of indices and fitnesses
        indexed_fitnesses = list(enumerate(fitnesses))
        
        # Sort by fitness in descending order
        indexed_fitnesses.sort(key=lambda x: x[1], reverse=True)
        
        # Extract the top individuals based on sorted fitness scores
        selected_indices = [idx for idx, _ in indexed_fitnesses[:num_to_select]]
        
        # Select the fitnesses and population members by these indices
        selected_fitnesses = [fitnesses[i] for i in selected_indices]
        selected_population = [population[i] for i in selected_indices]
        
        return selected_fitnesses, selected_population

    def increase_population(self):
        """
        Increase the current population by breeding - apply mutation and crossover.
        """
        # Calculate the number of individuals needed to reach the maximum population size
        num_individuals_needed = self.max_population - len(self.population)

        original_population = self.population
        
        # Generate new individuals until the population reaches the maximum size
        while len(self.population) < self.max_population:
            # Select two parents randomly from the current population
            parent1, parent2 = random.sample(original_population, 2)
            
            offspring = self.breed(parent1, parent2)
            
            # Add the offspring to the population
            self.population.append(offspring)

    # Mutate a given individual's genes
    def mutate(self, assignment):
        for highlighter, bucket in assignment.items():
            if random.random() < self.mutation_rate:
                assignment[highlighter] = random.choice(highlighter.assignable_buckets)
        return assignment

    def crossover(self, assignment1, assignment2):
        """
        Perform crossover between two parents to produce a new individual.
        
        :param assignment1: Dictionary representing the assignment of highlighters to buckets for parent 1
        :param assignment2: Dictionary representing the assignment of highlighters to buckets for parent 2
        :return: Dictionary representing the assignment of highlighters to buckets for the offspring
        """
        # Initialize the offspring assignment
        offspring_assignment = {}

        # Randomly select genes (highlighter-bucket pairs) from each parent
        for highlighter in assignment1:
            if random.random() < 0.5:  # 50% chance to inherit from parent 1
                offspring_assignment[highlighter] = assignment1[highlighter]
            else:  # 50% chance to inherit from parent 2
                offspring_assignment[highlighter] = assignment2[highlighter]

        return offspring_assignment
    
    def breed(self, assignment1, assignment2):
        child_assignment = {}

        # Randomly make child a copy of either parent1 or parent2
        if random.random() < 0.5:
            child_assignment = assignment1
        else:
            child_assignment = assignment2

        # Probability of getting both parent's genes
        if random.random() < self.crossover_rate:
            child_assignment = self.crossover(assignment1, assignment2)

        # Mutate the genes
        child_assignment = self.mutate(child_assignment)        

        return child_assignment

    def get_best(self):
        """
        Return the highest score of the individuals in the current population.
        """
        best_score = float('-inf')  # Initialize with negative infinity to find the maximum score
        best_assignment = None
        
        # Iterate over each individual in the population
        for assignment in self.population:
            # Calculate the fitness score for the individual using the provided fitness function
            score = self.fitness_function(self.buckets, assignment)
            
            # Update the best score if the current score is higher
            if score > best_score:
                best_score = score
                best_assignment = assignment
        
        return best_assignment, best_score
        
    def run(self):
        for i in range(self.max_epochs):
            # Evaluate their fitnesses
            scores = []
            for assignment in self.population:
                scores.append(self.fitness_function(self.buckets, assignment))
    
            # Cull the herd (survival of the fittest!)
            best_scores, best_assignments = self.cull(scores, self.population)
            self.population = best_assignments

            # Breed to restore the population 
            self.increase_population()

            print(f"Epoch: {i}, Best Score: {max(best_scores)}")

            if max(best_scores) >= .9999:
                break

In [22]:

# generator_algorithm = random_assignment
fitness_function = fitness_1
generator_algorithm = generate_random_assignment
max_population = 1000
max_epochs = 50

a = GeneticAlgorithmRunner(buckets, highlighters, mutation_rate=0.01, crossover_rate=1.0, fitness_function=fitness_function, generator_algorithm=generator_algorithm, max_pop=max_population, max_epochs=max_epochs)
a.run()

Epoch: 0, Best Score: 0.8133333333333334
Epoch: 1, Best Score: 0.8266666666666667
Epoch: 2, Best Score: 0.8566666666666667
Epoch: 3, Best Score: 0.8766666666666667
Epoch: 4, Best Score: 0.8766666666666667
Epoch: 5, Best Score: 0.8766666666666667
Epoch: 6, Best Score: 0.8766666666666667
Epoch: 7, Best Score: 0.8766666666666667
Epoch: 8, Best Score: 0.8766666666666667
Epoch: 9, Best Score: 0.8766666666666667
Epoch: 10, Best Score: 0.8766666666666667
Epoch: 11, Best Score: 0.8766666666666667
Epoch: 12, Best Score: 0.8766666666666667
Epoch: 13, Best Score: 0.8766666666666667
Epoch: 14, Best Score: 0.8766666666666667
Epoch: 15, Best Score: 0.8766666666666667
Epoch: 16, Best Score: 0.8766666666666667
Epoch: 17, Best Score: 0.8766666666666667
Epoch: 18, Best Score: 0.8766666666666667
Epoch: 19, Best Score: 0.8766666666666667
Epoch: 20, Best Score: 0.8766666666666667
Epoch: 21, Best Score: 0.8766666666666667
Epoch: 22, Best Score: 0.8766666666666667
Epoch: 23, Best Score: 0.8766666666666667
Epoch: 24, Best Score: 0.8766666666666667
Epoch: 25, Best Score: 0.8766666666666667
Epoch: 26, Best Score: 0.8766666666666667
Epoch: 27, Best Score: 0.8766666666666667
Epoch: 28, Best Score: 0.8766666666666667
Epoch: 29, Best Score: 0.8766666666666667
Epoch: 30, Best Score: 0.8766666666666667
Epoch: 31, Best Score: 0.8766666666666667
Epoch: 32, Best Score: 0.8766666666666667
Epoch: 33, Best Score: 0.8766666666666667
Epoch: 34, Best Score: 0.8766666666666667
Epoch: 35, Best Score: 0.8766666666666667
Epoch: 36, Best Score: 0.8766666666666667
Epoch: 37, Best Score: 0.8766666666666667
Epoch: 38, Best Score: 0.8766666666666667
Epoch: 39, Best Score: 0.8766666666666667
Epoch: 40, Best Score: 0.8766666666666667
Epoch: 41, Best Score: 0.8766666666666667
Epoch: 42, Best Score: 0.8766666666666667
Epoch: 43, Best Score: 0.8766666666666667
Epoch: 44, Best Score: 0.8766666666666667
Epoch: 45, Best Score: 0.8766666666666667
Epoch: 46, Best Score: 0.8766666666666667
Epoch: 47, Best Score: 0.8766666666666667
Epoch: 48, Best Score: 0.8766666666666667
Epoch: 49, Best Score: 0.8766666666666667

In [23]:

assignment, score = a.get_best()

In [24]:

generate_bar_graphs(buckets, assignment)

In [25]:

score

Out[25]:

0.8766666666666667

In [ ]:

Notebooks