solutions of geo: geoprocessing

Dear All,
in this post I'd like to talk about the work we have done for the LaMMA consortium.

The Problem
The purpose of this project is to build a complete Spatial Data Infrastructure (SDI) to provide a spatio-temporal raster data processing, publishing, and interactive visualisation facility. This platform is candidate to substitute the current one which was already built leveraging on Open Source software but which was rather static and contained no OGC services.

The data that will be ingested into the system is generated by an existing processing infrastructure which produces a set of different MetOc models. Our goal is to manage the geophysical parameter (or variables) produced by the following models:

ARW ECM

3 Km resolution
9 Km resolution

50 Km resolution

The ingestion is started every day at noon and midnight, hence there are 2 run-times a day for each model at a certain resolution and the produced data contains different forecast times.
- ARW ECM (3 days with interval of 1h)
- GFS (8 days with interval of 6h)
The data is produced in GriB format (version 1).

Our Solution
Leveraging on the OpenSDI suite and specifically on the following components:

as well as some other well known Open Source project such as (Apache Tomcat, Apache Http server, Postgres) we provided an extensible and standard based platform to automatically ingest and publish data.

The infrastructure we have put together is depicted in the deployment diagram below.


Deploy diagram

This infrastructure has been designed from the beginning with the goal of being scalable in terms of supporting large number of external users since it is based on a GeoServer Master/Slave infrastructure where multiple slaves can be installed for higher throughput. Caching will be tackled in a successive phase.

As you can see we provided three access level for different type of users:

Admin can locally access to the entire infrastructure and add instances of GeoServer to the cluster to improve performances
Poweruser can remotely add files to ingestion and administer GeoBatch via Basic Autentication
User can look at ingested data accessing one of the GeoServer slave machines via Apache httpd proxy server. The load of these accesses is distributed between all available slaves.

As mentioned above, the main building blocks are as follows:

GeoServer for providing WMS, WCS and WFS services with support for the TIME and Elevation dimensions
GeoNetwork, for publishing metadata for all data with specific customizations for managing the TIME dimensions in the dataset
GeoBatch, to perform preprocessing and ingestion in near real time of data and related metadata with minimal human intervention

Using GeoBatch for ingestion and data preprocessing

In the LaMMA project the GeoBatch framework is used to preprocess and ingest the incoming GriB files as well as to handle data removal based on a sliding temporal window (currently set to 7 days) since it was a design decision to keep around for live serving on the last 7 days of forecasts.

Below you can find a diagram depicting one of the automatic ingestion flow we created for the LaMMA project using the GeoBatch framework.


GeoBatch ingestion flow example

The various building blocks comprising this flow are explained here below:

NetCDF2GeotiffAction reads the incoming GRIB file and produces a proper set of Geotiff perfoming on-the- fly tiling, pyramiding and unit conversions.Each GeoTiff represent a 2D slice out of one of the original 4D cubes contained in the source GriB file
ImageMosaicAction uses the GeoServer Manager library to create the ImageMosaic store and layer in the GeoServer Master. The created ImageMosaic contains proper configuration to parse Time and Elevation dimensions' values from the GeoTiff in order to create 4D layers in GeoServer.
XstreamAction takes an XML file and deserializes it to a Java object this is passed to the next action.
FreeMarkerAction produces a proper xml metadata file for publishing in GeoNetwork, using a pre-cooked template and the passed data model.
GeoNetworkAction published the metadata on the target GeoNetwork
ReloadAction forces a reload on all the GeoServer slaves in order to pick up the changes done by the master instance

This type of flow, (with a slight different set up) is used to convert and publish the 3 different incoming models.

The other type of flow is the remove flow which is a composed by the following building blocks:

ScriptingAction executes a remove.groovy script which will:

calculate the oldest time to retain
select older files to be removes
search and remove matching metadata from the GeoNetwork
remove collected layers and stores from the GeoServer Master catalog
delete permanently succesfully removed files

ReloadAction forces a reload on all the GeoServer Slave.

Using GeoNetwork for metadata management
We have customized the metadata indexing (thanks Lucene!) in GeoNetwork in order to be able to index meteorological model execution in terms of their run time as well as in term of their forecast times.
Generally speaking the data we are dealing with is driven by a meterological model which produces daily a certain number of geophysical parameters with temporal validity that spans for certain number of time instants (forecast times) in the future. In GeoNetwork we are currently creating a new metadata object for each geophysical parameter (e.g. Temperature) of a new model run; this metadata object contains multiple links to WMS requests for each forecast time, leveraging the TIME dimension in GeoServer (see picture below). Moreover the forecast times themselves are indexed so that advanced searches can be done on them.

If you have questions about the work described in this post or if you want to know more about our services could help your organization to reach its goals, do not hesitate to contact us.

The GeoSolutions team,

Finding all the objects within a certain distance from a point is surely a common GIS problem. The problem is normally solved using OGC "dwithin" filters or by computing a buffer and then finding all the intersecting objects.

Very often both of the approaches fail miserably in case the coordinate system is a geographic one, as common libraries, such as JTS and GEOS, are not able to handle the non planar nature of it. As far as "dwithin" is concerned rencent Oracle and PostGIS versions can manage the problem properly, but what to do if they cannot be used?

We had to solve this problem when computing data distribution statistics over raster data cells that are within a certain distance from a given point, and making for an accurate calculation regardless of how long the distance was.

To do that we created a new GeoServer WPS process, "gs:PointBuffers", that can create a set of buffers given a point, a target SRS and a set of distances in meters.

In case the SRS denotes a geographic spatial reference system the GeoTools GeodeticCalculator is used to sample the set of points that are at the given distance, looping over a closed sets of azimuths to cover the entire shape.

Interested in seeing the results? I certainly was.

Let's start with a set of small buffers at a medium latitude: 10, 30, 50 and 100 km buffers around a point located in northern Italy. Here is there result:

As you can see, drawing the result in plain WGS84 (plate carré for the conoisseurs) we get elliptical shapes. This should not come as a surprise if you consider that at 45° one degree of latitude spans 111km, whilst a degree of longitude spans only 78km (see the "Degree length" table at Wikipedia).

What if we pump up the distance significantly? Let's try with 100, 500, 1000, 2000 and 3000km instead. Here is the result:

See the funny shape we get? This is the effect of the size of one degree of longitude shrinking as we move towards north.

It is also a good indicator of how deformed the now common WGS84 maps, often published on the web, are.

If you want to see the same data in a common projection, let's have a look at the same map in EPSG:3857 (aka the Google projection):

Somewhat better, even if the Mercator tendency to inflate areas at high latitudes is well evident.

Well, this is it. The gs:PointBuffers is soon going to land in GeoServer for your testing pleasure.

We'd very much like to tackle the same problem against lines and polygons as well. Interested? Let us know!

The GeoSolutions team

solutions of geo

Senin, 28 November 2011

Serving Meteo data with GeoServer, GeoBatch and GeoNetwork: the LaMMA use case

Rabu, 24 November 2010

Fun Stuff: Computing circular buffers in geographic coordinates