| 1 | Reading data |
|---|
| 2 | ============ |
|---|
| 3 | |
|---|
| 4 | Create a workspace from the standard workspace factory using either a path to a |
|---|
| 5 | single data file or a path to a directory of similiar files. A workspace maps |
|---|
| 6 | (in the Python sense) collections (or layers) of feature data, providing access |
|---|
| 7 | to them via items() etc. |
|---|
| 8 | |
|---|
| 9 | >>> from mill import workspace |
|---|
| 10 | >>> w = workspace('docs/data') |
|---|
| 11 | >>> w # doctest: +ELLIPSIS |
|---|
| 12 | <mill.workspace.Workspace object at ...> |
|---|
| 13 | >>> w.path |
|---|
| 14 | 'docs/data' |
|---|
| 15 | >>> w.items() # doctest: +ELLIPSIS |
|---|
| 16 | [('test_uk', <mill.collection.Collection object at ...>)] |
|---|
| 17 | |
|---|
| 18 | A collection is a workspace item and can be obtained as with any mapping or |
|---|
| 19 | dict. The name of a collection corresponds to the OGR layer name, meaning that |
|---|
| 20 | it's the base of a file name. A collection's *schema* property is currently a |
|---|
| 21 | list of (name, integer type) tuples, but this is likely to change in the |
|---|
| 22 | future. The number of features in a collection, or its length, can be obtained |
|---|
| 23 | in the usual Python way. |
|---|
| 24 | |
|---|
| 25 | >>> c = w['test_uk'] |
|---|
| 26 | >>> c # doctest: +ELLIPSIS |
|---|
| 27 | <mill.collection.Collection object at ...> |
|---|
| 28 | >>> c.name |
|---|
| 29 | 'test_uk' |
|---|
| 30 | >>> c.schema |
|---|
| 31 | [('CAT', 2), ('FIPS_CNTRY', 4), ('CNTRY_NAME', 4), ('AREA', 2), ('POP_CNTRY', 2)] |
|---|
| 32 | >>> len(c) |
|---|
| 33 | 48 |
|---|
| 34 | |
|---|
| 35 | Features in a collection can be accessed as if the collection were a dict. |
|---|
| 36 | |
|---|
| 37 | >>> f = c['1'] |
|---|
| 38 | >>> f.id |
|---|
| 39 | '1' |
|---|
| 40 | >>> f.properties['CNTRY_NAME'] |
|---|
| 41 | 'United Kingdom' |
|---|
| 42 | |
|---|
| 43 | Users can control the response by binding callables to the collection's object |
|---|
| 44 | hook. The default object hook (used above) is mill.feature.Feature, a class |
|---|
| 45 | modeled loosely on GeoJSON. Callables must take 3 positional parameters and |
|---|
| 46 | return a Python object, like so: |
|---|
| 47 | |
|---|
| 48 | >>> from mill.feature import Feature |
|---|
| 49 | >>> def testing_feature(id, properties, wkb): |
|---|
| 50 | ... d = {} |
|---|
| 51 | ... for key, val in properties.items(): |
|---|
| 52 | ... if type(val) == type('string'): |
|---|
| 53 | ... val = val.encode('utf-8') |
|---|
| 54 | ... val = unicode(val) |
|---|
| 55 | ... d[key] = val |
|---|
| 56 | ... return Feature(id, d, wkb.encode('hex')) |
|---|
| 57 | |
|---|
| 58 | This one converts all string properties to unicode and hex encodes the WKB byte |
|---|
| 59 | string extracted from the data and is used like so: |
|---|
| 60 | |
|---|
| 61 | >>> c.object_hook = testing_feature |
|---|
| 62 | >>> f = c['1'] |
|---|
| 63 | >>> f.id |
|---|
| 64 | '1' |
|---|
| 65 | >>> f.properties['CNTRY_NAME'] |
|---|
| 66 | u'United Kingdom' |
|---|
| 67 | >>> f.geometry # doctest: +ELLIPSIS |
|---|
| 68 | '0103000000010000000d0000003d1059a48...' |
|---|
| 69 | |
|---|
| 70 | More efficient access to features by id is provided by a collection's *all* |
|---|
| 71 | attribute, which opens a data access session: |
|---|
| 72 | |
|---|
| 73 | >>> features = c.all |
|---|
| 74 | >>> f = features['1'] |
|---|
| 75 | >>> f.id |
|---|
| 76 | '1' |
|---|
| 77 | >>> f.properties['CNTRY_NAME'] |
|---|
| 78 | u'United Kingdom' |
|---|
| 79 | |
|---|
| 80 | This attribute also provides the iterator protocol: |
|---|
| 81 | |
|---|
| 82 | >>> features = c.all |
|---|
| 83 | >>> f = features.next() |
|---|
| 84 | >>> f.id |
|---|
| 85 | '0' |
|---|
| 86 | >>> f.properties['CNTRY_NAME'] |
|---|
| 87 | u'United Kingdom' |
|---|
| 88 | |
|---|
| 89 | Collections also provide a filtering iterator. The bbox positional parameter |
|---|
| 90 | causes the filter to pass only the features which intersect with the specified |
|---|
| 91 | (minx, miny, maxx, maxy) bounding value tuple. The tuple values are understood |
|---|
| 92 | to be of the same coordinate system as the data. |
|---|
| 93 | |
|---|
| 94 | >>> query = c.filter(bbox=(-1.0, 50.0, -0.5, 50.5)) |
|---|
| 95 | >>> query # doctest: +ELLIPSIS |
|---|
| 96 | <mill.collection.Iterator object at ...> |
|---|
| 97 | >>> results = [f for f in query] |
|---|
| 98 | >>> len(results) |
|---|
| 99 | 1 |
|---|
| 100 | >>> f = results[0] |
|---|
| 101 | >>> f.id |
|---|
| 102 | '45' |
|---|
| 103 | >>> f.properties['CNTRY_NAME'] |
|---|
| 104 | u'United Kingdom' |
|---|
| 105 | |
|---|