root/WorldMill/trunk/docs/reading-data.txt

Revision 948 (checked in by seang, 1 year ago)

Add benchmark script and set eol-style to native on everything except the shapefile

  • Property svn:eol-style set to native
Line 
1 Reading data
2 ============
3
4 Create a workspace from the standard workspace factory using either a path to a
5 single data file or a path to a directory of similiar files. A workspace maps
6 (in the Python sense) collections (or layers) of feature data, providing access
7 to them via items() etc.
8
9     >>> from mill import workspace
10     >>> w = workspace('docs/data')
11     >>> w # doctest: +ELLIPSIS
12     <mill.workspace.Workspace object at ...>
13     >>> w.path
14     'docs/data'
15     >>> w.items() # doctest: +ELLIPSIS
16     [('test_uk', <mill.collection.Collection object at ...>)]
17
18 A collection is a workspace item and can be obtained as with any mapping or
19 dict. The name of a collection corresponds to the OGR layer name, meaning that
20 it's the base of a file name. A collection's *schema* property is currently a
21 list of (name, integer type) tuples, but this is likely to change in the
22 future. The number of features in a collection, or its length, can be obtained
23 in the usual Python way.
24
25     >>> c = w['test_uk']
26     >>> c # doctest: +ELLIPSIS
27     <mill.collection.Collection object at ...>
28     >>> c.name
29     'test_uk'
30     >>> c.schema
31     [('CAT', 2), ('FIPS_CNTRY', 4), ('CNTRY_NAME', 4), ('AREA', 2), ('POP_CNTRY', 2)]
32     >>> len(c)
33     48
34
35 Features in a collection can be accessed as if the collection were a dict.
36  
37     >>> f = c['1']
38     >>> f.id
39     '1'
40     >>> f.properties['CNTRY_NAME']
41     'United Kingdom'
42
43 Users can control the response by binding callables to the collection's object
44 hook. The default object hook (used above) is mill.feature.Feature, a class
45 modeled loosely on GeoJSON. Callables must take 3 positional parameters and
46 return a Python object, like so:
47
48     >>> from mill.feature import Feature
49     >>> def testing_feature(id, properties, wkb):
50     ...     d = {}
51     ...     for key, val in properties.items():
52     ...         if type(val) == type('string'):
53     ...             val = val.encode('utf-8')
54     ...             val = unicode(val)
55     ...         d[key] = val
56     ...     return Feature(id, d, wkb.encode('hex'))
57
58 This one converts all string properties to unicode and hex encodes the WKB byte
59 string extracted from the data and is used like so:
60
61     >>> c.object_hook = testing_feature
62     >>> f = c['1']
63     >>> f.id
64     '1'
65     >>> f.properties['CNTRY_NAME']
66     u'United Kingdom'
67     >>> f.geometry # doctest: +ELLIPSIS
68     '0103000000010000000d0000003d1059a48...'
69
70 More efficient access to features by id is provided by a collection's *all*
71 attribute, which opens a data access session:
72
73     >>> features = c.all
74     >>> f = features['1']
75     >>> f.id
76     '1'
77     >>> f.properties['CNTRY_NAME']
78     u'United Kingdom'
79
80 This attribute also provides the iterator protocol:
81
82     >>> features = c.all
83     >>> f = features.next()
84     >>> f.id
85     '0'
86     >>> f.properties['CNTRY_NAME']
87     u'United Kingdom'
88
89 Collections also provide a filtering iterator. The bbox positional parameter
90 causes the filter to pass only the features which intersect with the specified
91 (minx, miny, maxx, maxy) bounding value tuple. The tuple values are understood
92 to be of the same coordinate system as the data.
93
94   >>> query = c.filter(bbox=(-1.0, 50.0, -0.5, 50.5))
95   >>> query # doctest: +ELLIPSIS
96   <mill.collection.Iterator object at ...>
97   >>> results = [f for f in query]
98   >>> len(results)
99   1
100   >>> f = results[0]
101   >>> f.id
102   '45'
103   >>> f.properties['CNTRY_NAME']
104   u'United Kingdom'
105
Note: See TracBrowser for help on using the browser.