Artificial Intelligence

And now for a completely inappropriate use of SQL Server

A while back I wrote up a short introductory overview of Genetic Algorithms. Just for the shear, absolute fun of it, I thought I’d implement a basic genetic algorithm within SQL Server and use it to solve a form of the knapsack problem.

Now first a few comments on this. As the title states, this is not really an appropriate use of SQL Server. Genetic algorithms are generally implemented in languages like Java, C++, C#, etc; languages that are good at complex mathematics, string manipulation and have complex data types. I’m not necessarily using efficient, well-performing methods here, UDFa abound. This is not an article on best practices and well-performing code. I’m also doing no error handling, which I would if this were a real system (in a more suitable language)

Still, doing just for the sake of seeing if it’s possible is all sorts of fun. So, without further ado, the knapsack problem, an approximate solution with genetic algorithms in SQL Server. (For the record, this is a multi-constrained, bounded knapsack problem)

The scenario

There’s a bag that has a maximum volume that it can hold and a maximum mass that it can hold (and we assume that we can pack perfectly with no wasted space). There are eight items, all with different masses, different volumes and different values. The goal here is to maximise the total value of all the items contained within the bag.

CREATE TABLE ObjectStatistics (
  ObjectNumber TINYINT NOT NULL,
  Mass NUMERIC(4,2) NOT NULL,
  Volume NUMERIC(4,2) NOT NULL,
  Value NUMERIC(4,2) NOT NULL,
  NumberAvailable TINYINT NOT NULL,
  CONSTRAINT pk_ObjectStats PRIMARY KEY CLUSTERED (ObjectNumber)
);

CREATE TABLE BagStatistics (
  MaxMass NUMERIC(5,2),
  MaxVolume NUMERIC(5,2)
);

INSERT INTO dbo.ObjectStatistics (ObjectNumber, Mass, Volume, Value, NumberAvailable)
VALUES
  (1,0.5,0.45,2,50),
  (2,1.5,0.25,20,10),
  (3,0.1,1.25,8,150),
  (4,0.25,0.1,1,250),
  (5,0.67,0.3,6,100),
  (6,0.34,0.75,18,5),
  (7,1.25,0.25,5,40),
  (8,1.1,0.8,10,25);

INSERT INTO dbo.BagStatistics (MaxMass, MaxVolume)
VALUES  (100, 75);

Those two tables set up the constraints for the scenario, the maximum mass and volume for the bag and the mass, volume, value and maximum number available for each of the items.

(more…)

Genetic Algorithms

Warning. This post has absolutely nothing to do with SQL Server.

What are Genetic Algorithms?

Genetic algorithms are a form of evolutionary computation, a branch of artificial intelligence that focuses on evolving effective or optimal solutions to difficult problems, based on the biological theory of evolution.

Genetic algorithms are, at their core, a search/optimisation technique. They are a way of finding maximum/minimum solutions to problems and, can be effective when there is no algorithmic solution to the problem. An example here would be the ‘Travelling Salesman’ problem.

Genetic algorithms work by taking an initial population of potential solutions (referred to as individuals), selecting a subset of the population that has the highest fitness then using that subset to generate a second generation. From the second generation again a subset with the highest fitness is selected and used to generate a third generation. This repeats until either the ‘fittest’ individual is considered a good enough solution, or until a certain number of generations have passed.

There are advantage to using genetic algorithms to solve problems over more traditional methods like hill climbing.

  • Genetic algorithms can quickly produce good solutions though they may take a lot of time to find the best solution. This is a benefit when the problem is such that the absolute best solution is not necessary, just one that is ‘good enough’
  • They are not susceptible to getting trapped by local maxima.
  • They do not work on the entire search space one potential solution at a time, but rather work on populations of potential solutions, focusing towards more optimal areas of the search space.

A genetic algorithm will almost always find an optimal solution, given enough time. The main downside is that they may take a lot of time to find that optimal solution.

(more…)