Examples¶
Hello World¶
In it’s simplest form, ASDF is a way of saving nested data structures
to YAML. Here we save a dictionary with the key/value pair 'hello':
'world'
.
from asdf import AsdfFile
# Make the tree structure, and create a AsdfFile from it.
tree = {'hello': 'world'}
ff = AsdfFile(tree)
ff.write_to("test.asdf")
# You can also make the AsdfFile first, and modify its tree directly:
ff = AsdfFile()
ff.tree['hello'] = 'world'
ff.write_to("test.asdf")
test.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
hello: world
...
Saving arrays¶
Beyond the basic data types of dictionaries, lists, strings and numbers, the most important thing ASDF can save is arrays. It’s as simple as putting a Numpy array somewhere in the tree. Here, we save an 8x8 array of random floating-point numbers. Note that the YAML part contains information about the structure (size and data type) of the array, but the actual array content is in a binary block.
from asdf import AsdfFile
import numpy as np
tree = {'my_array': np.random.rand(8, 8)}
ff = AsdfFile(tree)
ff.write_to("test.asdf")
Note
In the file examples below, the first YAML part appears as it
appears in the file. The BLOCK
sections are stored as binary
data in the file, but are presented in human-readable form on this
page.
test.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
my_array: !core/ndarray-1.0.0
source: 0
datatype: float64
byteorder: little
shape: [8, 8]
...
BLOCK 0:
allocated_size: 512
used_size: 512
data_size: 512
data: fa05b9262219e63f98a787bbf9d3dc3f90ea2b49...
#ASDF BLOCK INDEX
%YAML 1.1
--- [353]
...
Schema validation¶
In the current draft of the ASDF schema, there are very few elements
defined at the top-level – for the most part, the top-level can
contain any elements. One of the few specified elements is data
:
it must be an array, and is used to specify the “main” data content
(for some definition of “main”) so that tools that merely want to view
or preview the ASDF file have a standard location to find the most
interesting data. If you set this to anything but an array, asdf
will complain:
>>> from asdf import AsdfFile
>>> tree = {'data': 'Not an array'}
>>> AsdfFile(tree)
Traceback (most recent call last):
...
ValidationError: mismatched tags, wanted
'tag:stsci.edu:asdf/core/ndarray-1.0.0', got
'tag:yaml.org,2002:str'
...
This validation happens only when a AsdfFile
is instantiated, read
or saved, so it’s still possible to get the tree into an invalid
intermediate state:
>>> from asdf import AsdfFile
>>> ff = AsdfFile()
>>> ff.tree['data'] = 'Not an array'
>>> # The ASDF file is now invalid, but asdf will tell us when
>>> # we write it out.
>>> ff.write_to('test.asdf')
Traceback (most recent call last):
...
ValidationError: mismatched tags, wanted
'tag:stsci.edu:asdf/core/ndarray-1.0.0', got
'tag:yaml.org,2002:str'
...
Sharing of data¶
Arrays that are views on the same data automatically share the same data in the file. In this example an array and a subview on that same array are saved to the same file, resulting in only a single block of data being saved.
from asdf import AsdfFile
import numpy as np
my_array = np.random.rand(8, 8)
subset = my_array[2:4,3:6]
tree = {
'my_array': my_array,
'subset': subset
}
ff = AsdfFile(tree)
ff.write_to("test.asdf")
test.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
my_array: !core/ndarray-1.0.0
source: 0
datatype: float64
byteorder: little
shape: [8, 8]
subset: !core/ndarray-1.0.0
source: 0
datatype: float64
byteorder: little
shape: [2, 3]
offset: 152
strides: [64, 8]
...
BLOCK 0:
allocated_size: 512
used_size: 512
data_size: 512
data: 0492387ace99ee3f0dca96163b6fe23f585b6f09...
#ASDF BLOCK INDEX
%YAML 1.1
--- [482]
...
Saving inline arrays¶
For these sort of small arrays, you may not care about the efficiency
of a binary representation and want to just save the content directly
in the YAML tree. The set_array_storage
method
can be used to set the type of block of the associated data, either
internal
, external
or inline
.
internal
: The default. The array data will be stored in a binary block in the same ASDF file.external
: Store the data in a binary block in a separate ASDF file.inline
: Store the data as YAML inline in the tree.
from asdf import AsdfFile
import numpy as np
my_array = np.random.rand(8, 8)
tree = {'my_array': my_array}
ff = AsdfFile(tree)
ff.set_array_storage(my_array, 'inline')
ff.write_to("test.asdf")
test.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
my_array: !core/ndarray-1.0.0
data:
- [0.7726816827203242, 0.4151173721737259, 0.5156939147717585, 0.3477687440532522,
0.45284307954724357, 0.6266988582525486, 0.6459992271333249, 0.11536522192395504]
- [0.8528142199395172, 0.34458237346259113, 0.7090162114587883, 0.8954447263371279,
0.39735619383935494, 0.26671217478934295, 0.4179551229670473, 0.8740431628726147]
- [0.7169942901877134, 0.41192059287264393, 0.7423277857383657, 0.9182847348759194,
0.7400167915712794, 0.4993632063624319, 0.898811238901472, 0.6261061633708715]
- [0.5456008806753108, 0.3733473128556797, 0.2644500771285767, 0.49728155579026256,
0.580096336557075, 0.6470473721069336, 0.2165045171914678, 0.40358888957410977]
- [0.7556863137776323, 0.4818388333762561, 0.030923656562499846, 0.07991083634077156,
0.4135435450502086, 0.17720553099228564, 0.44234503817719883, 0.7297931845975514]
- [0.5598914869385364, 0.9853967505338581, 0.6381668920271317, 0.1730430134034613,
0.5600755013125209, 0.7189366063886848, 0.3217707979822725, 0.044565082458148164]
- [0.5509569406864036, 0.35415602281498226, 0.6833911498138794, 0.6331569734631903,
0.7298703540186904, 0.21639083352028032, 0.1504656372891342, 0.9590264269941154]
- [0.27454762535082244, 0.7031403862851929, 0.7248582374123043, 0.5543695761615545,
0.2619144465356187, 0.6957613552942141, 0.8968790186475265, 0.9983223308242803]
datatype: float64
shape: [8, 8]
...
Saving external arrays¶
ASDF files may also be saved in “exploded form”, in multiple files:
- An ASDF file containing only the header and tree.
- n ASDF files, each containing a single block.
Exploded form is useful in the following scenarios:
- Not all text editors may handle the hybrid text and binary nature of the ASDF file, and therefore either can’t open a ASDF file or would break a ASDF file upon saving. In this scenario, a user may explode the ASDF file, edit the YAML portion as a pure YAML file, and implode the parts back together.
- Over a network protocol, such as HTTP, a client may only need to
access some of the blocks. While reading a subset of the file can
be done using HTTP
Range
headers, it still requires one (small) request per block to “jump” through the file to determine the start location of each block. This can become time-consuming over a high-latency network if there are many blocks. Exploded form allows each block to be requested directly by a specific URI. - An ASDF writer may stream a table to disk, when the size of the table is not known at the outset. Using exploded form simplifies this, since a standalone file containing a single table can be iteratively appended to without worrying about any blocks that may follow it.
To save a block in an external file, set its block type to
'external'
.
from asdf import AsdfFile
import numpy as np
my_array = np.random.rand(8, 8)
tree = {'my_array': my_array}
ff = AsdfFile(tree)
# On an individual block basis:
ff.set_array_storage(my_array, 'external')
ff.write_to("test.asdf")
# Or for every block:
ff.write_to("test.asdf", all_array_storage='external')
test.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
my_array: !core/ndarray-1.0.0
source: test0000.asdf
datatype: float64
byteorder: little
shape: [8, 8]
...
test0000.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
...
BLOCK 0:
allocated_size: 512
used_size: 512
data_size: 512
data: a27f5b902e9be33f3a85e64011c7db3ff817b42b...
#ASDF BLOCK INDEX
%YAML 1.1
--- [255]
...
Streaming array data¶
In certain scenarios, you may want to stream data to disk, rather than writing an entire array of data at once. For example, it may not be possible to fit the entire array in memory, or you may want to save data from a device as it comes in to prevent data loss. The ASDF standard allows exactly one streaming block per file where the size of the block isn’t included in the block header, but instead is implicitly determined to include all of the remaining contents of the file. By definition, it must be the last block in the file.
To use streaming, rather than including a Numpy array object in the
tree, you include a asdf.Stream
object which sets up the structure
of the streamed data, but will not write out the actual content. The
file handle’s write
method is then used to manually write out the
binary data.
from asdf import AsdfFile, Stream
import numpy as np
tree = {
# Each "row" of data will have 128 entries.
'my_stream': Stream([128], np.float64)
}
ff = AsdfFile(tree)
with open('test.asdf', 'wb') as fd:
ff.write_to(fd)
# Write 100 rows of data, one row at a time. ``write``
# expects the raw binary bytes, not an array, so we use
# ``tostring()``.
for i in range(100):
fd.write(np.array([i] * 128, np.float64).tostring())
test.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
my_stream: !core/ndarray-1.0.0
source: -1
datatype: float64
byteorder: little
shape: ['*', 128]
...
BLOCK 0:
flags: BLOCK_FLAG_STREAMED
allocated_size: 0
used_size: 0
data_size: 0
data: 0000000000000000000000000000000000000000...
References¶
ASDF files may reference items in the tree in other ASDF files. The
syntax used in the file for this is called “JSON Pointer”, but users
of asdf
can largely ignore that.
First, we’ll create a ASDF file with a couple of arrays in it:
from asdf import AsdfFile
import numpy as np
tree = {
'a': np.arange(0, 10),
'b': np.arange(10, 20)
}
target = AsdfFile(tree)
target.write_to('target.asdf')
target.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
a: !core/ndarray-1.0.0
source: 0
datatype: int64
byteorder: little
shape: [10]
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
b: !core/ndarray-1.0.0
source: 1
datatype: int64
byteorder: little
shape: [10]
...
BLOCK 0:
allocated_size: 80
used_size: 80
data_size: 80
data: 0000000000000000010000000000000002000000...
BLOCK 1:
allocated_size: 80
used_size: 80
data_size: 80
data: 0a000000000000000b000000000000000c000000...
#ASDF BLOCK INDEX
%YAML 1.1
--- [429, 563]
...
Then we will reference those arrays in a couple of different ways.
First, we’ll load the source file in Python and use the
make_reference
method to generate a reference to array a
.
Second, we’ll work at the lower level by manually writing a JSON
Pointer to array b
, which doesn’t require loading or having access
to the target file.
ff = AsdfFile()
with AsdfFile.open('target.asdf') as target:
ff.tree['my_ref_a'] = target.make_reference(['a'])
ff.tree['my_ref_b'] = {'$ref': 'target.asdf#b'}
ff.write_to('source.asdf')
source.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
my_ref_a: {$ref: target.asdf#a}
my_ref_b: {$ref: target.asdf#b}
...
Calling find_references
will look up all of the
references so they can be used as if they were local to the tree. It
doesn’t actually move any of the data, and keeps the references as
references.
with AsdfFile.open('source.asdf') as ff:
ff.find_references()
assert ff.tree['my_ref_b'].shape == (10,)
On the other hand, calling resolve_references
places all of the referenced content directly in the tree, so when we
write it out again, all of the external references are gone, with the
literal content in its place.
with AsdfFile.open('source.asdf') as ff:
ff.resolve_references()
ff.write_to('resolved.asdf')
resolved.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
my_ref_a: !core/ndarray-1.0.0
source: 0
datatype: int64
byteorder: little
shape: [10]
my_ref_b: !core/ndarray-1.0.0
source: 1
datatype: int64
byteorder: little
shape: [10]
...
BLOCK 0:
allocated_size: 80
used_size: 80
data_size: 80
data: 0000000000000000010000000000000002000000...
BLOCK 1:
allocated_size: 80
used_size: 80
data_size: 80
data: 0a000000000000000b000000000000000c000000...
#ASDF BLOCK INDEX
%YAML 1.1
--- [443, 577]
...
A similar feature provided by YAML, anchors and aliases, also provides a way to support references within the same file. These are supported by asdf, however the JSON Pointer approach is generally favored because:
- It is possible to reference elements in another file
- Elements are referenced by location in the tree, not an identifier, therefore, everything can be referenced.
Anchors and aliases are handled automatically by asdf
when the
data structure is recursive. For example here is a dictionary that is
included twice in the same tree:
d = {'foo': 'bar'}
d['baz'] = d
tree = {'d': d}
ff = AsdfFile(tree)
ff.write_to('anchors.asdf')
anchors.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
d:
baz:
baz: &id001
baz: *id001
foo: bar
foo: bar
foo: bar
...
Compression¶
Individual blocks in an ASDF file may be compressed.
You can easily zlib or bzip2 compress all blocks:
from asdf import AsdfFile
import numpy as np
tree = {
'a': np.random.rand(256, 256),
'b': np.random.rand(512, 512)
}
target = AsdfFile(tree)
target.write_to('target.asdf', all_array_compression='zlib')
target.write_to('target.asdf', all_array_compression='bzp2')
target.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
a: !core/ndarray-1.0.0
source: 0
datatype: float64
byteorder: little
shape: [256, 256]
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
b: !core/ndarray-1.0.0
source: 1
datatype: float64
byteorder: little
shape: [512, 512]
...
BLOCK 0:
compression: bzp2
allocated_size: 500712
used_size: 500712
data_size: 524288
data: f0eac15d0900c03fa48ce94ac2bcd13f2422eea2...
BLOCK 1:
compression: bzp2
allocated_size: 2002225
used_size: 2002225
data_size: 2097152
data: b4b033bf5f12ea3f2400a9a07e9de63fba67c948...
#ASDF BLOCK INDEX
%YAML 1.1
--- [445, 501211]
...
Saving history entries¶
asdf
has a convenience method for notating the history of
transformations that have been performed on a file.
Given a AsdfFile
object, call
add_history_entry
, given a description of the
change and optionally a description of the software (i.e. your
software, not asdf
) that performed the operation.
from asdf import AsdfFile
import numpy as np
tree = {
'a': np.random.rand(256, 256)
}
ff = AsdfFile(tree)
ff.add_history_entry(
u"Initial random numbers",
{u'name': u'asdf examples',
u'author': u'John Q. Public',
u'homepage': u'http://github.com/spacetelescope/asdf',
u'version': u'0.1'})
ff.write_to('example.asdf')
example.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
a: !core/ndarray-1.0.0
source: 0
datatype: float64
byteorder: little
shape: [256, 256]
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
history:
- !core/history_entry-1.0.0
description: Initial random numbers
software: !core/software-1.0.0 {author: John Q. Public, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf examples, version: '0.1'}
time: 2016-11-08 09:38:11.163666
...
BLOCK 0:
allocated_size: 524288
used_size: 524288
data_size: 524288
data: 2469662c7dfaed3fd455af388844d03f21005cee...
#ASDF BLOCK INDEX
%YAML 1.1
--- [610]
...
Saving ASDF in FITS¶
Sometimes you may need to store the structured data supported by ASDF
inside of a FITS file in order to be compatible with legacy tools that
support only FITS. This can be achieved by including a special
extension with the name ASDF
to the FITS file, containing the YAML
tree from an ASDF file. The array tags within the ASDF tree point
directly to other binary extensions in the FITS file.
First, make a FITS file in the usual way with astropy.io.fits. Here, we are building a FITS file from scratch, but it could also have been loaded from a file.
This FITS file has two image extensions, SCI and DQ respectively.
from astropy.io import fits
hdulist = fits.HDUList()
hdulist.append(fits.ImageHDU(np.arange(512, dtype=np.float), name='SCI'))
hdulist.append(fits.ImageHDU(np.arange(512, dtype=np.float), name='DQ'))
Next we make a tree structure out of the data in the FITS file. Importantly, we use the same arrays in the FITS HDUList and store them in the tree. By doing this, asdf will be smart enough to point to the data in the regular FITS extensions.
tree = {
'model': {
'sci': {
'data': hdulist['SCI'].data,
},
'dq': {
'data': hdulist['DQ'].data,
}
}
}
Now we take both the FITS HDUList and the ASDF tree and create a
AsdfInFits
object. It behaves identically to the
AsdfFile
object, but reads and writes this special
ASDF-in-FITS format.
from asdf import fits_embed
ff = fits_embed.AsdfInFits(hdulist, tree)
ff.write_to('embedded_asdf.fits')
The special ASDF extension in the resulting FITS file looks like the
following. Note that the data source of the arrays uses the fits:
prefix to indicate that the data comes from a FITS extension.
content.asdf
#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.2.1}
model:
dq:
data: !core/ndarray-1.0.0
source: fits:DQ,1
datatype: float64
byteorder: big
shape: [512]
sci:
data: !core/ndarray-1.0.0
source: fits:SCI,1
datatype: float64
byteorder: big
shape: [512]
...
To load an ASDF-in-FITS file, first open it with astropy.io.fits
, and then
pass that HDU list to AsdfInFits
:
with fits.open('embedded_asdf.fits') as hdulist:
with fits_embed.AsdfInFits.open(hdulist) as asdf:
science = asdf.tree['model']['sci']