FAQ for C-squares

C-Squares FAQ (Frequently Asked Questions)

Author: Tony Rees, CSIRO Marine and Atmospheric Research

Date created: 1 June 2003
Date last revised: 26 June 2003

1. What is c-squares, and what purpose does it serve?
2. Who can benefit from using c-squares?
3. Why not simply store, and quote, latitude and longitude values with a particular data item?
4. Who owns c-squares, and is the system proprietary?
5. Where can I see examples of c-squares in use?
6. What is the basis for the c-squares codes, and how easy is it to generate them from latitude/longitude data?
7. Why are the codes based on numbers, as opposed to a using wider range of alphanumeric characters?
8. Can I see a worked example of transforming a latitude/longitude to the equivalent c-square code (and back again)?
9. Are there any lat/lon pairs which don't fit the above treatment?
10. OK, so now how do I represent an area?
11. Wouldn't that result in very long strings of codes, for large areas?
12. Doesn't the "compression" method described above for representing areas, complicate the text-based spatial searching method?
13. What tools exist now, to generate c-squares codes?
14. What size of c-squares should I use to encode my data?
15. How long can c-squares strings be in practice?
16. How big (and what shape) are individual c-squares on the ground?
17. How do the codes behave in different hemispheres (N vs S, E vs W)?
18. How are "on the line" cases handled (i.e., data on a boundary between 2 or 4 adjacent squares)?
19. Is there a method to determine whether a particular c-square code is valid?
20. Are c-square codes intended to be explicitly visible, or to operate behind the scenes?
21. How is c-squares searching implemented?
22. If I produce a list of c-squares, is there an easy way to represent the equivalent area/s on a map?
23. What is the relationship between c-squares and current formal standards e.g. FGDC, ISO, Open GIS, GML?
24. Aren't there already other systems around similar to c-squares?
25. What are the advantages/disadvantages of a gridded approach to representing spatial extents?
26. How do c-squares compare with bounding rectangles (minimum bounding rectangles, MBR's) for representing spatial extents?
27. What c-squares development activity is going on, and who is doing it?
28. How can I find out more about c-squares?

1. What is c-squares, and what purpose does it serve?

C-squares stands for "Concise Spatial Query and Representation System" and is a method of indexing the geographic location of objects or observational data on the surface of the earth, in a simple alphanumeric format suitable for subsequent querying by any text-based system or search engine. At its heart is a tiled (gridded) representation of the earth, with each tile identified by a particular code derived from two of its boundaries of latitude and longitude. The spatial coverage or "footprint" of any point, line, area, or suite of points on the earth's surface can then be represented as a list of codes for all of the tiles within which the data or objects occur, and this list can then be stored, transmitted, or queried, without needing recourse to the actual data in the first instance. This list then forms a convenient spatial index, which can then be queried (along with other items which might be indexed such as names, dates, etc.) to ascertain whether or not a certain set of data, objects, or documents fit a user's desired search criteria, and are therefore worth retrieving or investigating further.

The value of "c-squares" is that, unlike other location-based indexes such as street addresses or postcodes, it is equally applicable anywhere on the surface of the globe, independent of country, language, whether or not the area is populated, or whether the observations (for example) which are being indexed occur on land or sea. In addition, c-squares can be defined at a flexible range of scales, from 10 x 10 degrees (approx. 1000 km) through 5 x 5 degrees (500 km), 1 x 1 degrees (100 km), 0.5 x 0.5 degrees (50 km), 0.1 x 0.1 degrees (10 km) and so on, as fine as the user requires.

2. Who can benefit from using c-squares?

Anyone interested in the storage, exchange, and retrieval of data or information with a geographic component, who does not wish to go to the level of sophistication of a fully fledged geographic information system (GIS) merely to be able to search their data holdings by geographic location, or exchange dataset "footprints" with other persons or compatible systems.

3. Why not simply store, and quote, latitude and longitude values with a particular data item?

Individual values of latitude and longitude can be, and in most cases would continue to be, stored with particular data items (georeferenced objects). C-squares provides an additional level of functionality over and above these "native" values, in several respects:
(i) the system reduces latitude and longitude (2 dimensional variable) to a single dimensional variable, for easy indexing and subsequent searching
(ii) the system reduces redundancy for multi-point data which occur within a single square (a single code indicating "data present" replaces multiple individual values, for metadata-level information)
(iii) owing to the hierarchical structure of the codes, a single (fine) level of encoding supports spatial querying at any of a range of coarser scales as well (for more information, see the answer to Q.21)
(iv) the system enables a wide variety of types of dataset "footprints" to be represented in a simple, human- and machine-readable manner, which can be easily queried, exchanged, and visualized, without requiring access to the source data.

4. Who owns c-squares, and is the system proprietary?

C-squares has been developed by Dr Tony Rees of CSIRO Marine and Atmospheric Research in Australia, in consultation with other interested parties, and is intended to be a completely open, public domain, non-proprietary format, e.g. as is the case with GeoTIFF (refer http://remotesensing.org/geotiff/faq.html, from which equivalent section some of the present wording has been "borrowed"). The following quotation from the GeoTIFF FAQ is also equally applicable to c-squares:
"There is no restriction on licensing, implementation, promulgation, or any uses of the format. The format is entirely open, and available to all. The specifications are public, there are abundant free software source libraries, toolkits, data samples, and technical support through the email forum."
Ongoing development of the c-squares specification and any special "use cases" is coordinated by Tony Rees at this time, with any changes available for discussion and comment from interested parties via the "c-squares-discuss" listserver prior to implementation.

5. Where can I see examples of c-squares in use?

At time of writing (June 2003) there are three c-squares enabled databases in operation at CSIRO Marine and Atmospheric Research, the most visible one (to external users) being the CMAR "MarLIN" metadata directory, which has a publicly accessible c-squares spatial search interface for research vessel data (and in addition, displays the relevant c-square codes on returned HTML metadata pages). Another large database planning to implement c-squares spatial indexing is the international OBIS (Ocean Biogeographic Information System) consortium, who anticipate using c-squares to index over a million biological data records before the end of 2003. Several other databases are currently employing just the mapping component of c-squares to display spatial data, among them FishBase (http://www.fishbase.org/), CephBase (http://www.cephbase.utmb.edu/), and OBIS itself.

6. What is the basis for the c-squares codes, and how easy is it to generate them from latitude/longitude data?

C-squares codes have been developed as an extension of an existing global grid, the WMO (World Meteorological Organization) 10 x 10 degree squares, a non-proprietary system for which maps can already be found on the internet (http://www.nodc.noaa.gov/OC5/wmoatlind.html and http://www.nodc.noaa.gov/OC5/wmopacific.html) and which is used (at that scale) to index substantial marine data holdings already (e.g., the NOAA "World Ocean Database"). The WMO grid was chosen because of its clearly defined relationship to principal lines of latitude and longitude at a 10 degree spacing (refer next question for the relevant detail). C-squares extends this notation by a hierarchical subdivision according to 2 sequences: a "main" sequence at 10 x 10, 1 x 1, 0.1 x 0.1 degrees, and an "intermediate" sequence at 5 x 5, 0.5 x 0.5, 0.05 x 0.05 degrees, etc. These two sequences are then "interwoven" to give the c-squares code for a square at any level of the hierarchy.

It is possible to generate the c-squares code for any combination of latitude and longitude, be it a point, a step along a line or poly-line, or a point within a polygon, either using an automated routine, a dedicated "click on a map" tool, or as a manual operation (see unde Q.8). Users can also create a simple annotated map using standard printed divisions of latitude and longitude, and read off the code for any desired location directly.

7. Why are the codes based on numbers, as opposed to using a wider range of alphanumeric characters?

The main reason that the codes are purely numeric (other from the colon separator character) is the desire to retain as simple a relationship as possible with numeric values of latitude and longitude, as encoded directly into all but the leading character of each "cycle" of the hierarchy. Having said that, a case could be made for using a letter instead of a number for the leading character (since these are in effect a "special digit" rather than a number), however the decision has been made to use two existing precedents (WMO squares, see above, plus "Blue Pages" subdivisions of these squares), rather than introduce a new notation convention.

A separate advantage of a (basically) all-number system, as opposed to one incorporating other alphabetic characters, is that there is then a reduced chance of accidental transliteration of similar looking characters (e.g. O and 0, I and 1, lowercase "L" and 1), either during manual code entry, or OCR (optical character recognition) copying. Also, the requirement to standardise on (or discriminate between) either lowercase or uppercase characters is avoided.

8. Can I see a worked example of transforming a latitude/longitude to the equivalent c-square code (and back again)?

OK, here we go ... C-squares codes are built from coordinates of latitude and longitude as interleaved decimal degrees, with two cases of "special digits":
(i) In the initial cycle of the hierarchy (i.e., position [X] in the example below), a digit for "global quadrant", i.e. 1, 3, 5, 7 indicating NE, SE, SW, NE, respectively
(ii) In subsequent cycles of the hierarchy (i.e., position [Y] in the example below), a digit for "intermediate quadrant" assigned as follows:
... "1" where latitude is 0-4 ("low"), longitude is 0-4 ("low"), both on a 0-9 scale
... "2" where latitude is 0-4 ("low"), longitude is 5-9 ("high"), both on a 0-9 scale
... "3" where latitude is 5-9 ("high"), longitude is 0-4 ("low"), both on a 0-9 scale
... "4" where latitude is 5-9 ("high"), longitude is 5-9 ("high"), both on a 0-9 scale

In addition, initial whole units of latitude are padded with zeroes as required to give 2 leading digits (e.g. 0-90 become 00-90), and initial whole units of longitude are padded with zeroes as required to give 3 leading digits (0-180 become 000-180).

C-squares are encoded according to the principle of interleaving latitude and longitude, with the above "special digits" inserted as appropriate once every encoding cycle (=decimal subdivision). In principle, a c-square code

[X]abb:[Y]ab:[Y]ab:[Y]ab:[Y]ab:[Y]ab...

interleaves a latitude of aa.aaaa... with a longitude of bbb.bbbb..., in global quadrant [X], with values of [Y] inserted according to whether the subsequent values of "a" and "b" are low or high on the scale described above.

Here's an example in practice:
The Washington Monument in Washington, D.C. has a quoted location of 38 degrees 53 minutes 22 seconds north, 77 degrees 2 minutes 8 seconds west, or (in decimal degrees) latitude 38.8894, longitude -77.0356. Here's how this would encode into c-squares:

(a) Interleave the latitude and longitude, as per the principle above:

     latitude    _3__:_8_:_8_:_8_:_9_:_4_
     longitude    _07:__7:__0:__3:__5:__6

     combination so far: _307:_87:_80:_83:_95:_46

Now, add the relevant "intermediate quadrant" number as the initial number in each triplet, according to the system described above:

     combination so far: _307:487:380:383:495:246

Finally, add the relevant "global quadrant" (in this case 7, indicating latitude is N, longitude is W) as the initial digit:

     Final c-squares code: 7307:487:380:383:495:246.

As a matter of interest, this does not in fact define a point but a small square (or more likely a rectangle), 0.0001 degrees square (approximately 10 metres high and somewhat less across), with our designated point at the bottom right hand corner (in this global quadrant). On account of the hierarchical nature of the c-squares codes, we can, in addition, say immediately that this point is also located in:

0.0005-degree square 7307:487:380:383:495:2
0.001-degree square 7307:487:380:383:495
0.005-degree square 7307:487:380:383:4
0.01-degree square 7307:487:380:383
0.05-degree square 7307:487:380:3
0.1-degree square 7307:487:380
0.5-degree square 7307:487:3
1-degree square 7307:487
5-degree square 7307:4, and
10-degree square 7307.

This means that the encoded point would be returned as a simple text match looking for the phrase, for example, "7307:487:3....": this is equivalent to a spatial search of "get me all the data in the 0.5 x 0.5 degree square bounded by 38.5 degrees and 39.0 degrees north, and 77.0 and 77.5 degrees west".

Decoding a c-squares code back to a region designated in lat/lon coordinates is straightforward. Consider the code as an "initial cycle" (4 digits), zero-to-many "intermediate cycles" (3 digits each), and zero or one "final cycle" (1 or 3 digits), all separated by colons. To transform a given c-squares code to its bounding coordinates of latitude and longitude, use the following steps:

(1) The first ("prefix") digit of the initial cycle indicates the directions in which latitudes and longitudes are measured, namely:

    1 = latitudes N, longitudes E;    3 = latitudes S, longitudes E;     5 = latitudes S, longitudes W;    7 = latitudes N, longitudes W

(2) The second digit, of the initial cycle, and the second digit of all subsequent cycles, give the latitude in decimal degrees, with the first cycle indicating tens, the second cycle units, the third cycle tenths, the fourth cycle one hundredths, and so on, e.g.

  from 7307:487:380:383 we extract latitude [N] 38.88 degrees (7307:487:380:383)

(3) The third and fourth digit, of the initial cycle, and the third digit of all subsequent cycles, give the longitude in decimal degrees, with the first cycle indicating tens, the second cycle units, the third cycle tenths, the fourth cycle one hundredths, and so on, e.g.

  from 7307:487:380:383 we extract longitude [W] 77.03 degrees (7307:487:380:383)

(4) if the final cycle is incomplete (a single digit rather than three digits), e.g. 7307:487:380:383:4, to the values of latitude and longitude obtained above, we add the following:

  - final digit = 3 or 4: add another 0.5 of the last "full" subdivision of latitude - e.g. 38.88 becomes 38.885 (north or south)
  - final digit = 2 or 4: add another 0.5 of the last "full" subdivision of longitude - e.g. 77.03 becomes 77.035 (east or west)

(5) the values obtained above define the "minimum" boundaries of the square, i.e. closest to the global origin. To obtain the "maximum" boundaries we add a figure equal to the resolution, in directions away from the global origin. The resolution is obtained from the number of cycles in the code following the sequence 10 > 1 > 0.1 > 0.01 etc. If the final cycle is incomplete, instead of finishing the sequence with, for example, 0.01, we use 5x that value (0.05 in that example).

Thus: as deduced above, the code 7307:487:380:383 represents a 0.01-degree resolution square, with its "minimum" boundaries at 38.88 degrees N and 77.03 degrees W. Knowing the resolution, it also follows that the "maximum" boundaries are 38.89 N and 77.04 W.

Following the same logic, the code 7307:487:380:383:4 represents a 0.005-degree resolution square, extending from 38.885 to 38.89 degrees N, and 77.035 to 77.04 degrees W.

9. Are there any lat/lon pairs which don't fit the above treatment?

There are two "special cases" where the above treatment needs additional explanation and/or a slight adjustment:
(i) values of zero latitude or longitude are treated as positive, i.e. 0 latitude is always in the NE or NW global quadrant (not SE or SW), 0 longitude in the NE or SE quadrant (not NW or SW).
(ii) values on the "edge of the map", i.e. latitude +90 or -90, longitude +180 or -180, are treated as (e.g.) +89.9999... , +179.9999..., because otherwise they would end up being assigned to 10 x 10 degree squares which don't exist (since, for example, there isn't a square extending from 90 N to 100 N, or 180 E to 190 E).

10. OK, so now how do I represent an area?

An area is represented in c-squares by a string of codes, at a chosen resolution, for all the tiles which the area overlaps, separated by the "|" (vertical bar or "pipe") character: e.g. (at 0.1 degree resolution) the whole of "D.C." (District of Columbia) around Washington would be represented as 7307:486:489|7307:486:499|7307:487:380|7307:487:381|7307:487:390|7307:487:391 (6 squares). Notice that one of these squares (7307:487:380) has already been encountered above, as enclosing the Washington Monument. If we were to go to a finer resolution, e.g. 0.05 degree squares, we would end up with 19 codes; a coarser resolution (e.g. 0.5 degree squares) would result in 2 codes only.

These lists can be arrived at by either of two methods. One (requiring human input) would be to overlay a map of the desired area with (for example) a 0.1 degree grid, select a random point in every square which the target area overlaps, and encode that (or simply read the code off a suitably annotated map, and enter it manually). The other (which can be automated, but requires a series of designated points which, when connected by straight lines, serve to define the perimeter of the target area), is to use a polygon-fill algorithm which will automatically generate a representative set of points within the supplied polygon (at a resolution exceeding that ultimately required), then send each of these to a c-squares encoder and generate a list of unique codes to be built into the c-squares string.

11. Wouldn't that result in very long strings of codes, for large areas?

Yes, but ... there is a mechanism for "compressing" c-squares strings which is very effective, where a large area is to be encoded: a set of 1, 2, 3 or more "wildcard" characters (asterisks) can be employed following a "parent" cell at any level of the hierarchy, to indicate that all cells at the equivalent "child" level of the hierarchy are represented: thus, for example, "7307:*" indicates all four 5-degree squares within the "parent" 10-degree square 7307, "7307:***" indicates all one hundred 1-degree squares within the same parent, "7307:***:*" indicates all four hundred 0.5 degree squares within the same parent, and so on. This leads to considerable efficiencies in the representation method, similar (but not identical) to the approach of using "quadtrees" as a regular decomposition of space, and only traversing as far down the tree as needed to encode the level of detail required.

In any event, the number of codes required will decrease as one encodes at a coarser resolution, and increase (but not geometrically, if the compression notation is used) as one encodes at a finer resolution. Thus, it is important to try to establish a balance between the number of codes one may wish to store, and the level of spatial query the system will be designed to support, as the two are inter-related. Initial trials suggest that for each x2 linear step of increasing spatial resolution (=4 times the number of squares), something in the order of 30% more characters may be required to store the relevant codes, allowing for the compression efficiencies as described above in most cases. For example, to encode the whole world (global coverage) at 10 x 10 degree squares requires 648 codes of the form 1000|1001|1002, etc., while to encode the same at 1 x 1 degree squares requires only a modest increase to produce 1000:***|1001:***|1002:***|, etc., assuming all squares are full. In practice, "real" regions will tend to compress well in the interior, but probably little around the perimeter, which will mostly require to be represented as individual cells encoded at the highest resolution required.

12. Doesn't the "compression" method described above for representing areas, complicate the text-based spatial searching method?

It does, slightly: in other words, searching for (say) "1000:100" (1 degree square) has to be able to match any of the three cases "1000:***", "1000:1**" and "1000:100"; searching for "1000:100:1" (0.5 degree square) has to match any of the four cases "1000:***:*", "1000:1**:*", "1000:100:*" and "1000:100:1". Whether such searches can be performed more efficiently than just testing for these four strings individually, has yet to be investigated fully, although it would seem (in principle) amenable to some logic similar to traversing a tree-like structure (suggestions from experienced developers are welcome).

13. What tools exist now, to generate c-squares codes?

CSIRO Marine and Atmospheric Research (CMAR) has developed c-squares encoders for lat/lon pairs (points), lines, poly-lines, and filled polygons, which are currently available as web utilities accessible via the link "lat/long to c-square converter" on the resources page - the user can input (for example) a lat/lon pair or polygon boundary, and the equivalent c-square code(s) will be returned. The source code for encoding point data is publicly available as an example, at http://www.cmar.csiro.au/csquares/resources.html; that for lines, poly-lines and filled polygons is currently at prototype stage but can be obtained on request from Tony Rees (e-mail) by anyone interested in developing it further, with a view to posting the results as a shared resource.

14. What size of c-squares should I use to encode my data?

This is a question with a possibly evolving answer. C-squares was initially developed for metadata systems (scientific data catalogues) with a potentially world-wide, interoperable area of deployment, and it seems reasonable to suggest that in such systems (to permit interoperability), dataset spatial extents be described at a resolution of at least 1 x 1 degrees (approx. 100 x 100 km), to support a "base level" search capability at that level or coarser (i.e., 1 x 1, 5 x 5 or 10 x 10 degrees). However in individual environments, the decision may be made to support queries at a finer spatial level (e.g. 0.5 x 0.5 degrees or 0.1 x 0.1 degrees), in which case encoding will need to be at least at that level of resolution. As mentioned above, there is some deliberation required as far as (say) database design is concerned, in the level of resolution to encode, since for every finer step of resolution, potentially longer c-squares strings will be required to be stored, however this is a decision best made on a case-by-case basis.

Another way to approach this is to try some test encoding at a range of resolutions (say 1 x 1, 0.5 x 0.5, and 0.1 x 0.1 degrees), and see at what point sufficient detail is represented in regions of critical variability (e.g. coastlines, or oblique edges). If the representation appears too coarse, the finer squares sizes may be needed, and the consequent increase in number of codes may have to be treated as an acceptable overhead. (Of course, if only one or a few points are to be stored, or only small areas represented, encoding can be done at quite fine levels without difficulty).

15. How long can c-squares strings be in practice?

That depends on three factors: (a) the area (or length of line, or number of points) represented; (b) the resolution chosen for the encoding; and (c) whether or not large blocks of squares are included which can be "compressed" by the mechanism indicated above, in the answer to Question 10. Put simply, if a string represents 10 tiles, encoded at 1 degree resolution, with no compression possible, 89 characters will be required (=8 characters per code, plus 1 separator character after each code, except for the last). A string representing 1000 tiles at 0.5 degree resolution, again with no compression, would require 10,999 characters (=11,000 -1), since each code now has 10 characters, plus a separator. While these strings may appear somewhat lengthy, they are comparable in size to a moderate size text document, or (say) a small gif image, and are not too unwieldy to store and manipulate. Compression, too, will often result in substantial savings, for example a filled 10-degree square with 1-degree squares will occupy only 8 characters (e.g. "1000:***") instead of the 899 which would be needed ("1000:100|1000:101|1000:102|...") if each square were individually specified.

16. How big (and what shape) are individual c-squares on the ground?

Individual c-squares, while appearing square on a simple "equatorial cylindrical" (EC) projection, actually are close to square only at the equator, and gradually become trapezoidal as latitudes increase (and in fact are more-or-less triangular where they meet at the poles). The height of a c-square is always constant at any given scale, e.g. 1-degree c-squares are approximately 110 km (actually 111) high, while their width varies from 111 km at the equator through 85 km at latitude 40 N or S, tending to zero at the poles. The sides of c-squares are straight lines on the EC projection, however the top and bottom (except at the equator) are slightly curved on the ground, being portions of parallels of latitude which are free of curvature only at the equator. This effect is most noticeable with larger c-squares and higher latitudes, while the sides are straight (being always a N-S bearing), though tapering slightly, at any latitude.

As a first approximation, the following sizes are applicable to c-squares:

size (degrees) ... height ... width [at equator] ... width [at 40° N/S]

10 x 10 .............. 1100 km ....... 1100 km .......... 850 km
5 x 5 .................... 550 km ........ 550 km ........... 430 km
1 x 1 .................... 110 km ........ 110 km ............ 85 km
0.5 x 0.5 ................ 55 km ......... 55 km .... ........ 43 km
0.1 x 0.1 ................ 11 km ......... 11 km ............. 8.5 km
0.05 x 0.05 ............ 5.5 km ........ 5.5 km ............ 4.3 km
0.01 x 0.01 ............ 1.1 km ........ 1.1 km ............ 850 m
0.005 x 0.005 ........ 550 m ......... 550 m ............. 430 km
0.001 x 0.001 ........ 110 m ......... 110 m ............. 85 m
0.0005 x 0.0005 ...... 55 m .......... 55 m .............. 43 m
0.0001 x 0.0001 ...... 11 m .......... 11 m .............. 8.5 m

Values for the length of a degree of longitude at any latitude can be obtained from the on-line converter available at http://pollux.nss.nima.mil/calc/degree.html, and are approximately as follows:
111 km at 0 degrees N/S; 105 km at 20 degrees N/S; 85 km at 40 degrees N/S; 56 km at 60 degrees N/S; 19 km at 80 degrees N/S.

17. How do the codes behave in different hemispheres (N vs S, E vs W)?

The patterns of c-square codes fan out from the global origin (0 deg. latitude, 0 deg. longitude), and form mirror images in N-S, and W-E directions. Thus, 10-degree square x817 is in the top right corner of the NE global quadrant (1817), the lower right corner of the SE global quadrant (3817), the lower left corner of the SW global quadrant (5817), and the top left of the NW global quadrant (7817). At subsequent levels of the hierarchy, square 100 is always closest to the global origin, and square 499 is always furthest away (same behavior as for the x817 squares just described). Diagrams of these patterns can be found in the c-squares specification, at http://www.cmar.csiro.au/csquares/spec1.htm.

18. How are "on the line" cases handled (i.e., data on a boundary between 2 or 4 adjacent squares)?

By definition, data "on the line" are always into the next "higher" square in absolute terms (e.g. 10 rather than 9; -10 rather than -9), subject to the two types of special case described above (Q.8). This means that, for example, a point with latitude 10 N is encoded within a 10-degree square extending from 10-20, not 0-10 degrees. By extension, a search for data in a square which is notionally 0-10 is in reality a search for data in the range 0-9.9999... , which may have to be borne in mind in some circumstances.

One caveat to this applies to the endpoints of lines, or local maximum extents of polygons, where these occur on a boundary but the line or polygon does not extend into the next "higher" square. In these cases, the last point is not coded since to do so would result in an incorrect representation of the line or polygon concerned (e.g. a square extending from 9-10 in both directions would appear to extend across four squares, 9-11 in both directions, instead of the expected value of one square.).

19. Is there a method to determine whether a particular c-square code is valid?

Yes, there are several possible ways:

(a) by using the on-line validator, provided as part of the "lat/long - to c-squares converter" page (accessible from the resources section)

(b) by constructing one's own validator if desired, based on the supplied source code (also accessible via the "resources" section)

(c) from first principles, thus:

-- from the information presented above in the answers to Q.8 and Q.11, it can be seen that the structure of any valid c-squares code follows the pattern:

   single initial cycle + zero-to-many complete 3-digit cycles + zero or one incomplete cycle

where the initial cycle has 4 digits, following complete cycles (if present) have 3 digits, and any final incomplete cycle (if present) has 1 digit, and all cycles are separated by a colon character (:).

Here are all possible valid options for members of these cycles:

Initial cycle, e.g. "7307:...":

- First digit must be 1, 3, 5, or 7; second digit can be any number 0-8; third digit must be 0 or 1; if the third digit is 0, the fourth digit can be any number 0-9, if the third digit is 1, the fourth can be any number 0-7.

complete 3-digit cycles (where present), e.g. "____:487:380:383:495"   or   "____:487:380:383:495:_":

- First digit can be any number 1-4, or an asterisk.
- If first digit is 1, second digit must be a number 0-4, or an asterisk, and third digit must be a number 0-4, or an asterisk.
- If first digit is 2, second digit must be a number 0-4, or an asterisk, and third digit must be a number 5-9, or an asterisk.
- If first digit is 3, second digit must be a number 5-9, or an asterisk, and third digit must be a number 0-4, or an asterisk.
- If first digit is 4, second digit must be a number 5-9, or an asterisk, and third digit must be a number 5-9, or an asterisk.
- If first digit is an asterisk, both other digits must be asterisks.
- If either second or third digit is an asterisk, both must be asterisks.
- If the last digit of the previous cycle is an asterisk, all must be asterisks.

Final incomplete cycle (where present), e.g. "____:2"   or   "____:___:___:___:___:2":

- Same rules as for complete 3-digit cycles, however second and third digits omitted (indicates a resolution from the "intermediate" sequence, e.g. 0.05 degrees, as opposed to 0.1 or 0.01 degrees).

In effect, this means that (ignoring the asterisks in this instance), of the 10,000 possible 4-digit codes for the initial cycle (0000 through 9999), only 648 (6.5%) are valid, while of the 1,000 possible 3-digit codes in complete 3-digit cycles, only 100 (10.0%) are valid. While on the one hand this could be seen as undesirable redundancy, it also means that there is a chance that the presence of coding errors can be detected by some automated check, since at least some of them are likely to result in invalid codes being generated, which could then be automatically detected.

20. Are c-square codes intended to be explicitly visible, or to operate behind the scenes?

There are two answers to this question. First, c-squares can be implemented invisibly, as a behind-the-scenes indexing and search mechanism, hidden behind (for example) a clickable map as part of a user interface. This is probably desirable, since the user should not be expected to comprehend the code syntax in order to conduct a search. On the other hand, if c-squares codes are explicitly included in web documents, for example, then potentially they can be harvested as regular "words" by internet search engines and (one day) used for possible spatial searching.
Currently, in CMAR's "MarLIN" metadata directory, both approaches are used: the user search interface employs c-squares searching, but this is never apparent to the user; however the returned documents do include a set of c-squares codes, for potential harvesting by internet search engines (see example records at http://www.cmar.csiro.au/csquares/samples.htm).

21. How is c-squares spatial searching implemented?

C-squares spatial searching is a text-match procedure, best explained using a "telephone number" analogy.

For example, the author's agency telephone number (at CSIRO Marine and Atmospheric Research in Tasmania, Australia) is +61 362 325222, where +61 is the country code (Australia), +61 3 represents a region code (Victoria and Tasmania), +61 362 represents a sub-region code (southern Tasmania), and +61 362 32 a district code (central Hobart). Thus, all subscribers in a single country (Australia) could be selected from a list by searching for the text string +61......., all subscribers from a region could be selected by searching for +61 3.... , and so on.

C-squares works in an almost exactly similar manner. Using the example given in the answer to Q.8 above, searching for "7307..." will return all georeferenced items within that 10 x 10 degree square, whatever the resolution at which they have been encoded; searching for "7307:4..." will return all items within that 5 x 5 degree square, and so on.

The only proviso is that if data have been encoded at, say, 5 x 5 degree resolution, then they will not automatically be returned if the search is conducted at a finer resolution (e.g. 1 x 1 degree squares). A suggested solution to this is to make such a search more "intelligent", e.g. by returning matches on the whole search string as "confirmed" hits, and matches on only that part of the string which exists in the target data (i.e., where the required resolution is not present) as additional "possible" hits - the data are in the vicinity of the search area requested, and may in fact overlap it, however this cannot be determined owing to limitations in the encoded resolution. This, therefore, is a good argument for encouraging data encoders to agree between each other on a certain minimum encoding resolution (e.g. 1 x 1 degree squares) if interoperability between their respective systems is desired.

For additional remarks on spatial searching, see the the answer to Q.8, with supplementary information under Q.12.

22. If I produce a list of c-squares, is there an easy way to represent the equivalent area/s on a map?

Yes, the present CMAR c-squares mapper has been constructed for this purpose. It accepts any valid string of c-squares as a web call (for example from an HTML form with the relevant form action), along with additional variables such as "title", "legend", etc., and will return an HTML page with a "best fit" map displaying the requested squares, together with the supplied title and legend. It is also possible to plot multiple strings on the same map; change maps to any of a range of options; enlarge the map; and create active maps, with "click-on-a-square" functionality - for relevant information, see the "about-mapper" page.

It is also possible that other tools to display c-squares on base maps may be created by individual developers to suit other specific needs or client groups in the future.

23. What is the relationship between c-squares and current formal standards e.g. FGDC, ISO, Open GIS, GML?

C-squares does not currently have any formal relationship with the standards groups mentioned, however has been brought to the attention of some of these groups informally via conference presentations and personal contact, so that members of (for example) FGDC, OGC and the developers of GML can take its existence into account where relevant. Being a non-proprietary standard should be beneficial in this regard, since none of the groups mentioned would have any interest in endorsing and/or adopting a standard which is not public-domain.

24. Aren't there already other systems around which are similar to c-squares?

To answer this question, it is best to divide c-squares into its two constituent parts: first the principle of building up representations of shapes from strings (lists) of tiles, and second the specific global grid employed.

To the author's knowledge, c-squares is the only "open" (i.e., non proprietary) system described which employs the first approach, although a related approach underlies a number of proprietary, quadtree-based methods of spatial indexing (e.g. Oracle 8's "Spatial Cartridge") - however, for the latter, it is more usual to encounter locally defined tiling systems (tessellations) derived from the properties of the subject, rather than ones developed from first principles as a global coverage. (Various global coverages have been proposed, but do not appear to have been adopted in user-addressable, functional systems).

Regarding the choice of global grid, a number of contenders are available besides the WMO 10 x 10 degree grid, including Marsden Squares (10 x 10 degree, but with a different notation); International Map of the World (IMW) rectangles (6 x 4 degrees); Maidenhead Locators (2 x 1 degree, also 5 minute x 2.5 minute subsquares); and others. In addition global grids have been described which divide the world into a series of triangles (Dutton's "Quaternary Triangular Mesh" and others), or hexagons; plus there are a number of local grids with equal-dimension tiles, such as the UK National Grid and others, however these typically do not scale to form a seamless global coverage. The following reasons resulted in the choice of WMO squares as a starting point, plus the custom hierarchical subdivision employed for c-squares:
(i) WMO 10 x 10 degree squares are the only system to devise a nomenclature which incorporates actual digits of latitude and longitude, in decimal degrees
(ii) It was considered desirable to have a fully hierarchical system (each smaller unit incorporating the codes of all "parents" as preceding digits), so such a hierarchy was devised based on the initial choice of WMO squares as a starting point (as it did not already exist)
(iii) It was considered helpful to have a relatively large number of steps in the hierarchy (hence 10 > 5 > 1 > 0.5 > 0.1, etc. instead of 10 > 1 > 0.1 etc., or degree > minute > second), so as to be potentially useful at a wide and flexible range of scales - the conventional sequence of degree > minute > second having too large (60-fold) "jumps" between levels.
(iv) Squares were considered the only option for a system which continues to relate intuitively to bounding coordinates of latitude and longitude, in user-recognizable units, as the hierarchy is traversed.

25. What are the advantages/disadvantages of a gridded approach to representing spatial extents?

The main advantage of a gridded approach (e.g., as compared with point data, or vector polygon boundaries) is relatively rapid searching, by seeking the relevant tile names, to see whether data or subject matter is present in the user's desired region of interest. There may be a cost elsewhere in producing the gridded representations to begin with, e.g. from data supplied as polygons, however this is an up front cost (compile once, use many times) and can be conducted as a background and/or offline task, similar to the continuous index-building tasks undertaken by an internet search engine as it crawls the web.

The disadvantages of this approach are as follows:
- The additional time needed to create the index, for polygon data in particular, the space needed to store the index once created, and (possibly) a mechanism to keep the index up-to-date (for data which are not static)
- The fact that a grid representation is less exact than a polygon (i.e., potential small errors/uncertainties will typically remain associated with any point, or portion of a boundary, their magnitude depending on the choice of tile size)
- The fact that the choice of tile size (at encoding) controls the finest level of spatial query which can be supported, for a given point or dataset
- The fact that polygons which adjoin (e.g. boundaries of adjacent regions) will frequently have a "boundary" set of squares in common - thus, data within one of these squares cannot unequivocally be assigned to either polygon, without supplementary information being held.

In order to compensate for the above, it is recommended that the original point, line or polygon vector information continue to be stored with the native data, where it can potentially be examined separately if needed to answer specific questions of the above types.

26. How do c-squares compare with bounding rectangles (minimum bounding rectangles, MBR's) for representing spatial extents?

Minimum bounding rectangles (MBRs (also known as bounding rectangles or bounding coordinates) are frequently used to represent dataset spatial extents because they are relatively easy to construct, store, exchange, and query. However they suffer from being a "good fit" to only a small subset of potential dataset footprints, since in the real world these are frequently irregular in shape, or regular but not aligned with parallels of latitude and longitude, or fragmented, or include holes (for example marine data around an island or continent) which the MBR method of representation does not cater for. C-squares has been designed specifically as an improvement over the MBR method of representation, to greatly eliminate the "false positives" encountered when a "search" rectangle intersects a portion of a "data" rectangle which does not, in fact, contain any data.

27. What c-squares development activity is going on, and who is doing it?

C-squares development has been taking place at CSIRO Marine and Atmospheric Research in Australia (CMAR) since December 2001, principally by Tony Rees with assistance from Miroslaw Ryba and Philip Bohm. A variety of persons from other agencies and interest groups have also contributed, either via personal discussions or the "c-squares-discuss" listserver (see below).

Since 2001/2002, development has concentrated in the following areas:

- Development of the c-squares notation, and rules for handling "special cases (e.g. as described in Q.18, above)

- Testing the generation, storage, and searching of codes in real-world databases

- Development of the CMAR c-squares mapper, and gradual incorporation of new features into this utility

- Development of encoding routines for points, sets of points, lines, poly-lines, and polygons

- Incorporation of new features to the polygon encoder, and trialling its usefulness for indexing satellite image data

- Interaction with interested parties around the world, including presentations at relevant technical meetings, and dissemination of materials via the website and the "c-squares-discuss" listserver.

For a more detailed list (which will be maintained to be as current as possible), consult the page C-squares News and Site Updates.

28. How can I find out more about c-squares?

The c-squares home page at http://www.marine.csiro.csquares/ is the current repository for all c-squares related information. There is a published article describing the system in the journal "Oceanography", vol. 16 no. 1 (March 2003), available in .pdf form via this link. Users can also submit technical questions, participate in online discussions, and view previous posts, via the "c-squares-discuss" listserver.

Author's address:

Dr. Tony Rees,
Divisional Data Centre, CSIRO Marine and Atmospheric Research,
GPO Box 1538,
Hobart, Tasmania 7001,
Australia.

Email: Tony.Rees@csiro.au.

Acknowledgements

I should like to acknowledge the contributions of the following persons in the development and uptake of c-squares to date:

Ken Walker (Museum Victoria) for showing me his system of data indexing and searching by numbered mapsheets (a significant precursor to c-squares)
John Hockaday (Geoscience Australia), Doug Nebert (US FGDC), Simon Cox (CSIRO Exploration and Mining) and Rob Atkinson (Social Change Online), and various participants at EOGEO 2002 for stimulating discussions
Rainer Froese (FishBase), Phoebe Zhang (OBIS) and James Wood (CephBase) for deploying production versions of aspects of the system
Lola Olsen (NASA GCMD), Dave Watts (Australian Antarctic Division), Glen Smith (CMAR), Edward King (CSIRO Office of Space Science Applications), Ross Swick (US National Snow and Ice Data Center), and others for their interest in the system
Miroslaw Ryba and Philip Bohm (CMAR), for excellent programming assistance
David Hastings (NGDC "GLOBE" Project) and Martin Dix (CSIRO Atmospheric Research) for the supply of base map images utilised in the c-squares mapper, and
A succession of my line managers at CMAR, for encouragement and support in developing and publicising "c-squares", over the period 2001-current.