| 1 |
Reading data |
|---|
| 2 |
============ |
|---|
| 3 |
|
|---|
| 4 |
Create a workspace from the standard workspace factory using either a path to a |
|---|
| 5 |
single data file or a path to a directory of similiar files. A workspace maps |
|---|
| 6 |
(in the Python sense) collections (or layers) of feature data, providing access |
|---|
| 7 |
to them via items() etc. |
|---|
| 8 |
|
|---|
| 9 |
>>> from mill import workspace |
|---|
| 10 |
>>> w = workspace('docs/data') |
|---|
| 11 |
>>> w # doctest: +ELLIPSIS |
|---|
| 12 |
<mill.workspace.Workspace object at ...> |
|---|
| 13 |
>>> w.path |
|---|
| 14 |
'docs/data' |
|---|
| 15 |
>>> w.items() # doctest: +ELLIPSIS |
|---|
| 16 |
[('test_uk', <mill.collection.Collection object at ...>)] |
|---|
| 17 |
|
|---|
| 18 |
A collection is a workspace item and can be obtained as with any mapping or |
|---|
| 19 |
dict. The name of a collection corresponds to the OGR layer name, meaning that |
|---|
| 20 |
it's the base of a file name. A collection's *schema* property is currently a |
|---|
| 21 |
list of (name, integer type) tuples, but this is likely to change in the |
|---|
| 22 |
future. The number of features in a collection, or its length, can be obtained |
|---|
| 23 |
in the usual Python way. |
|---|
| 24 |
|
|---|
| 25 |
>>> c = w['test_uk'] |
|---|
| 26 |
>>> c # doctest: +ELLIPSIS |
|---|
| 27 |
<mill.collection.Collection object at ...> |
|---|
| 28 |
>>> c.name |
|---|
| 29 |
'test_uk' |
|---|
| 30 |
>>> c.schema |
|---|
| 31 |
[('CAT', 2), ('FIPS_CNTRY', 4), ('CNTRY_NAME', 4), ('AREA', 2), ('POP_CNTRY', 2)] |
|---|
| 32 |
>>> len(c) |
|---|
| 33 |
48 |
|---|
| 34 |
|
|---|
| 35 |
Features in a collection can be accessed as if the collection were a dict. |
|---|
| 36 |
|
|---|
| 37 |
>>> f = c['1'] |
|---|
| 38 |
>>> f.id |
|---|
| 39 |
'1' |
|---|
| 40 |
>>> f.properties['CNTRY_NAME'] |
|---|
| 41 |
'United Kingdom' |
|---|
| 42 |
|
|---|
| 43 |
Users can control the response by binding callables to the collection's object |
|---|
| 44 |
hook. The default object hook (used above) is mill.feature.Feature, a class |
|---|
| 45 |
modeled loosely on GeoJSON. Callables must take 3 positional parameters and |
|---|
| 46 |
return a Python object, like so: |
|---|
| 47 |
|
|---|
| 48 |
>>> from mill.feature import Feature |
|---|
| 49 |
>>> def testing_feature(id, properties, wkb): |
|---|
| 50 |
... d = {} |
|---|
| 51 |
... for key, val in properties.items(): |
|---|
| 52 |
... if type(val) == type('string'): |
|---|
| 53 |
... val = val.encode('utf-8') |
|---|
| 54 |
... val = unicode(val) |
|---|
| 55 |
... d[key] = val |
|---|
| 56 |
... return Feature(id, d, wkb.encode('hex')) |
|---|
| 57 |
|
|---|
| 58 |
This one converts all string properties to unicode and hex encodes the WKB byte |
|---|
| 59 |
string extracted from the data and is used like so: |
|---|
| 60 |
|
|---|
| 61 |
>>> c.object_hook = testing_feature |
|---|
| 62 |
>>> f = c['1'] |
|---|
| 63 |
>>> f.id |
|---|
| 64 |
'1' |
|---|
| 65 |
>>> f.properties['CNTRY_NAME'] |
|---|
| 66 |
u'United Kingdom' |
|---|
| 67 |
>>> f.geometry # doctest: +ELLIPSIS |
|---|
| 68 |
'0103000000010000000d0000003d1059a48...' |
|---|
| 69 |
|
|---|
| 70 |
More efficient access to features by id is provided by a collection's *all* |
|---|
| 71 |
attribute, which opens a data access session: |
|---|
| 72 |
|
|---|
| 73 |
>>> features = c.all |
|---|
| 74 |
>>> f = features['1'] |
|---|
| 75 |
>>> f.id |
|---|
| 76 |
'1' |
|---|
| 77 |
>>> f.properties['CNTRY_NAME'] |
|---|
| 78 |
u'United Kingdom' |
|---|
| 79 |
|
|---|
| 80 |
This attribute also provides the iterator protocol: |
|---|
| 81 |
|
|---|
| 82 |
>>> features = c.all |
|---|
| 83 |
>>> f = features.next() |
|---|
| 84 |
>>> f.id |
|---|
| 85 |
'0' |
|---|
| 86 |
>>> f.properties['CNTRY_NAME'] |
|---|
| 87 |
u'United Kingdom' |
|---|
| 88 |
|
|---|
| 89 |
Collections also provide a filtering iterator. The bbox positional parameter |
|---|
| 90 |
causes the filter to pass only the features which intersect with the specified |
|---|
| 91 |
(minx, miny, maxx, maxy) bounding value tuple. The tuple values are understood |
|---|
| 92 |
to be of the same coordinate system as the data. |
|---|
| 93 |
|
|---|
| 94 |
>>> query = c.filter(bbox=(-1.0, 50.0, -0.5, 50.5)) |
|---|
| 95 |
>>> query # doctest: +ELLIPSIS |
|---|
| 96 |
<mill.collection.Iterator object at ...> |
|---|
| 97 |
>>> results = [f for f in query] |
|---|
| 98 |
>>> len(results) |
|---|
| 99 |
1 |
|---|
| 100 |
>>> f = results[0] |
|---|
| 101 |
>>> f.id |
|---|
| 102 |
'45' |
|---|
| 103 |
>>> f.properties['CNTRY_NAME'] |
|---|
| 104 |
u'United Kingdom' |
|---|
| 105 |
|
|---|