« Back to home

More Genetic Algorithm Optimisation

Posted on

The knapsack problem deals with a set of items we want to put in a bag, each with associated weights and values. Crucially, the bag has a weight capacity that the selected items can’t exceed. The solution therefore is the subset of items that can fit in the bag with the optimum ‘profit’. A few variations of the problem exist (and even more approaches to solving them) but this post is about applying the genetic algorithm metaheuristic to 0-1 knapsack:

  • Using test datasets from here
  • Genes are encoded as ordered boolean lists of items
  • And the search space is every subset (combination of items)

Other parameters in population initialisation, evaluation, selection and modification were compared using the following base implementation in C++:

  • Generations ‘complete’ when 3 or more chromosomes in the population match the solution, because it is unlikely to get that many duplicates in a population so much smaller than the search space at random
  • The average number of cycles to reach completion is taken from 20 tests, because randomly generating chromosomes in the initialisation stage massively determines the rest of the experiment (potentially giving us a full set of solutions or the poorest subsets each time)

My first problem was in sorting the population; the profit should be maximised while the difference between the capacity and subset weight should ideally be minimised. The fitness function can’t maximise and minimise simultaneously, so I tried a terniary operation like this:

int evaluate( vector<bool> chromosome )
{
    // get the amount the subset's weight exceeds the capacity
    int weightFitness = max( getWeight( chromosome )-capacity, 0 );
    // simply add up the value of the subset
    int profitFitness = getProfit( chromosome );
    // fitness is double the profit IF within capacity ELSE profit-exceedance
    return (weightFitness==0) ? (2*profitFitness) : (profitFitness-weightFitness);
}

It assumes importance for weight and profit which I’d expect to be case-specific, but before experimenting more with the fitness function I carried out the following tests:

  • Population size: 64 | Mutation rate: 0.5 | Mutation: flip 3 random genes = Average cycles: 87

  • Population size: 32 | Mutation rate: 0.5 | Mutation: flip 3 random genes = Average cycles: 243

  • Population size: 64 | Mutation rate: 0.5 | Mutation: flip 2 random genes = Average cycles: 103

  • Population size: 64 | Mutation rate: 0.5 | Mutation: flip 1 random genes = Average cycles: 173

  • Population size: 64 | Mutation rate: 1.0 | Mutation: flip 1 random genes = Average cycles: 94

  • Population size: 64 | Mutation rate: 0.1 | Mutation: flip 1 random genes = Average cycles: 918

  • Population size: 64 | Mutation: flip 1 for worst N-3, 1 more for worst N-9 and 1 more for worst N-16 = Average cycles: 35

Then I simplified the fitness function as follows:

int evaluate( vector<bool> chromosome )
{
    // same getFitness and getWeight methods...
    return (0.X*profitFitness) - (1.0*weightFitness);
}

Where X is specified in the results below (each with population size=64 and the last selection/mutation method from before):

  • Fitness: 0.50 for X | Average cycles = 45

  • Fitness: 0.25 for X | Average cycles: 35

  • Fitness: 0.40 for X | Average cycles: ~40


The results show parameters and functions optimal for the dataset used: high population and fitness based mutation rates. I haven’t yet implemented features of GAs like crossover modification which I did last week to further optimise the results. However the number of average cycles varied unreliably, which might be improved by increasing the number of tests (from 20 to 50 for example) or by changing the approach to pseudo-random generation in the initialisation. What the excercise offered most was experience in trying to optimise parameter based systems, and it even inspired an idea for GA image approximation in my upcoming project!


For Natural Computing (IS53052A)