Wednesday, November 7, 2007

Using Circular Area Selection

This mini tutorial explains how to use the Spatial extensions of MySql 5.x and demonstrates the high performaces that can be obtained if used in the correct way.

The tutorial is intended to users who know SQL and MySQL in particular.

Step 1. Creation of a Spatial table.


The first step explains how to create a table containing geographic data named Points.

create table Points (
name VARCHAR(20) PRIMARY KEY,
location Point NOT NULL,
description VARCHAR(200),
SPATIAL INDEX(location)
);

This DDL command creates a table named Points containing a set of records
with a name and a location characterized by a Point Geometry.

As you can see a spatial extension can be used as any other MySQL data type
simply including it in the definition of a field.

The base class of every Spatial Type dat is Geometry.

The complete hyerarchy of geometry data type supported by MySQL can be found at

http://dev.mysql.com/doc/refman/4.1/en/spatial-extensions.html

Step 2. Insert data inside the Spatial Table.

This step shows how is easy to insert data inside the Points table.

INSERT INTO Points (name, location) VALUES ( 'point1' , GeomFromText( ' POINT(31.5 42.2) ' ) )

This is just a common SQL INSERT operation, the only news is the use of the function
GeomFromText().
This function takes a string and returns a Geometry Object.
The format of the string is a GIS standard decribed at

http://dev.mysql.com/doc/refman/4.1/en/gis-wkt-format.html

Step 3. Retrieve data from the Spatial Table.

Retrieve data from the Points table is even simpler:

SELECT name, AsText(location) FROM Points;

This returns a result set which location is converted in the same GIS format introduced in Step2,
in fact the AsText function converts the internal representation of a geometry to a string format.

There are another couple of functions very useful that can be introduced at this point:

SELECT name, AsText(location) FROM Points WHERE X(location) < style="font-weight: bold;">Y(location) > 12;


This SELECT returns a result set of points which location X() (longitude) is minor of 10 and Y() (latitude) is major of 12.

Step 4. Advanced queries on the Spatial Table.

Converts as readable text the envelope of the specified Geometry.

SELECT AsText(Envelope(GeomFromText('LineString(1 1,2 2)')));


Returns the size (float) of the specified Geometry.
SELECT Dimension(GeomFromText('LineString(1 1,2 2)'));

Returns the Geometry type (varchar) of the specified Geometry.
SELECT GeometryType(GeomFromText('POINT(1 1)'));

Find points in a bounding box.

SET @bbox = 'POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))';
SELECT name, AsText(location) FROM Points WHERE Intersects( location, GeomFromText(@bbox) );

Step 5. Find points in a circular area.

This step want to show how to perform queries of point geometries on circular areas (that areas described by a a center and a radius.)

The first solution you could mind to find entities in a circular area would be:

SET @point = 'POINT(10 10)';
SET @radius = 20;
SELECT name, AsText(location) FROM Points WHERE Distance(location, GeomFromText(@point)) < @radius; But this solution DOESN'T WORK, because the Distance function is not implemented.
In fact the MySQL Spatial Extension docs specifies that only a subset of the OpenGis standard
has being covered by their implementatation !!

A good roundtrip to this problem is to use the intersect function.

NOTE: The MySQL Spatial Documentation specifies that the Intersect function for
every Geometry is approximated to the Intersect function of the bounding boxes of the involved
geometries.

Due this approximation to abtain a correct result we have to filter the intersect results whith a pitagoric distance
computation.

SET @center = GeomFromText('POINT(10 10)');
SET @radius = 30;
SET @bbox = CONCAT('POLYGON((',
X(@center) - @radius, ' ', Y(@center) - @radius, ',',
X(@center) + @radius, ' ', Y(@center) - @radius, ',',
X(@center) + @radius, ' ', Y(@center) + @radius, ',',
X(@center) - @radius, ' ', Y(@center) + @radius, ',',
X(@center) - @radius, ' ', Y(@center) - @radius, '))'
);

[1]
SELECT name, AsText(location)
FROM Points
WHERE Intersects( location, GeomFromText(@bbox) )
AND SQRT(POW( ABS( X(location) - X(@center)), 2) + POW( ABS(Y(location) - Y(@center)), 2 )) < @radius; To Obtain a result ordered by distance from the center of the selection area:

[2]
SELECT name, AsText(location), SQRT(POW( ABS( X(location) - X(@center)), 2) + POW( ABS(Y(location) - Y(@center)), 2 )) AS distance
FROM Points
WHERE Intersects( location, GeomFromText(@bbox) )
AND SQRT(POW( ABS( X(location) - X(@center)), 2) + POW( ABS(Y(location) - Y(@center)), 2 )) < @radius
ORDER BY distance;


Step 6. Verify performances


This last step want to provide a method to verify performances in querying large dataset on Spatial Data.

First of all is provided a storage procedure able to fill the Points table with a given number of random records.

CREATE PROCEDURE fill_points(
IN size INT(10)
)
BEGIN
DECLARE i DOUBLE(10,1) DEFAULT size;

DECLARE lon FLOAT(7,4);
DECLARE lat FLOAT(6,4);
DECLARE position VARCHAR(100);

-- Deleting all.
DELETE FROM Points;

WHILE i > 0 DO
SET lon = RAND() * 360 - 180;
SET lat = RAND() * 180 - 90;

SET position = CONCAT( 'POINT(', lon, ' ', lat, ')' );

INSERT INTO Points(name, location) VALUES ( CONCAT('name_', i), GeomFromText(position) );

SET i = i - 1;
END WHILE;
END

Then invoke the stored procedure with a significant number

One million records for example.

CALL fill_points(1000000);

And then performs the queries [1] and [2].

On my Intel Core Duo 2.0 GHz Laptop I obtain these results:

Circular area selection without sorting [1]

43862 rows in set ~1.10 sec with 1.000.000 records

Circular area selection with sorting [2]

43862 rows in set ~1.72 sec with 1.000.000 records

NOTE: your results can change in size being the record locations randomatically generated.