unpack

From DavinciWiki
Jump to: navigation, search


Description

Unpack binary data from a file, given a template

Arguments and Return Values

Arguments: A template string and a file name.

Return Value: A davinci structure

Usage

Syntax: unpack(template = STRING, filename = STRING [, skip = 0 ] [, count = INT] [, col_names = STRING or TEXT BUFFER ])

The unpack function reads binary data from the input file, interprets it according to the specified record template, and returns it in a davinci structure. The optional skip parameter specifies the bytes into the file to skip before decoding the file. The optional count parameter specifies the number of records to be unpacked (a negative value means all records). The template is made up of column specifications of the form "A[n][*m]", where, "A" can be one of the following letters:

   Letter     Meaning              Allowable byte-sizes ("n")
   'a'        string               1+
   'I'        signed msb int       1-8
   'U'        unsigned msb int     1-8
   'i'        signed lsb int       1-8
   'u'        unsigned lsb int     1-8
   'r'        lsb real             4, 8
   'R'        msb real             4, 8
   'x'        skip                 1+

"n" is the size (in bytes), and "m" is the multiplicity, of repeat of "A[n]". If either "m" or "n" is not specified, it is assumed to be one. "*" is part of the syntax. For example, "i4*3" specifies an array of 3- 4-byte little-endian integers. Multiple column specifications are concatenated to form a template of a record. For example, "i4*3r8a5" specifies a 3-column record, with a 3-element array of 4-byte little-endian integers, followed by a little-endian double-float, followed by a character string of five characters. Gaps in the template can be specified using "x", e.g., "i4*3x10r8x5a5" specifies the same record structure as the previous example, except that the fields are non-contiguous.

Every field in the record is placed in its own column within a davinci structure with generic names assigned in the form "cn" or "cn_m" unless the user specifies their own column names in the optional argument col_names. The argument col_names can be either a string (if the user is only reading 1 column) or a davinci text buffer (string array). Providing insufficient names will throw and error. Providing extra names is fine.

Multiplicity in strings gets split into multiple fields, since davinci text arrays cannot be 3-dimensional. A string column with multiplicity will use only one of the user-provided names and add on "_m" to indicate the multiplicity.

Notes (only applies to davinci before 2.18):

Unpack resizes types if necessary because davinci only supports unsigned bytes
and signed shorts and ints.  It checks and possibly converts columns with the type on the left
to the types on the right:
   signed bytes      ->   shorts (required conversion)
   unsigned shorts   ->   ints
   unsigned ints     ->   doubles (floats have inadequate precision)

Examples

This is the current behavior of davinci 2.18.
dv> t3 = unpack("Iu2i3u4r4r8x4U2U3U4R4R8a6*3", "strmulttest.dat", 0)
struct, 14 elements
   c1: 1x10x1 array of int8, bsq format [10 bytes]
   c2: 1x10x1 array of uint16, bsq format [20 bytes]
   c3: 1x10x1 array of int32, bsq format [40 bytes]
   c4: 1x10x1 array of uint32, bsq format [40 bytes]
   c5: 1x10x1 array of float, bsq format [40 bytes]
   c6: 1x10x1 array of double, bsq format [80 bytes]
   c7: 1x10x1 array of uint16, bsq format [20 bytes]
   c8: 1x10x1 array of uint32, bsq format [40 bytes]
   c9: 1x10x1 array of uint32, bsq format [40 bytes]
   c10: 1x10x1 array of float, bsq format [40 bytes]
   c11: 1x10x1 array of double, bsq format [80 bytes]
   c12_0: Text Buffer with 10 lines of text
       1: hello0
       2: hello0
       3: hello0
       4: hello0
       5: hello0
       6: hello0
       7: hello0
       8: hello0
       9: hello0
       10: hello0
   c12_1: Text Buffer with 10 lines of text
       1: hello1
       2: hello1
       3: hello1
       4: hello1
       5: hello1
       6: hello1
       7: hello1
       8: hello1
       9: hello1
       10: hello1
   c12_2: Text Buffer with 10 lines of text
       1: hello2
       2: hello2
       3: hello2
       4: hello2
       5: hello2
       6: hello2
       7: hello2
       8: hello2
       9: hello2
       10: hello2

 These examples are from pre-2.18 where it did type upgrades.

 This example performs a lot of type upgrades and splits the string multiplicity as mentioned above.
 This file was used for testing which is why it uses every type in this sequence:

 signed byte, 2-byte unsigned lsb int, 3-byte signed lsb int, 4-byte unsigned lsb int, lsb float, lsb double
 skip 4 bytes, 2-byte unsigned msb int, 3-byte unsigned msb int, 4-byte unsigned msb int, msb float, msb double
 and of course the 3 6 character strings

 dv> a = unpack("Iu2i3u4r4r8x4U2U3U4R4R8a6*3", "strmulttest.dat", 0)
 struct, 14 elements
   col_0: 1x10x1 array of short, bsq format [20 bytes]
   col_1: 1x10x1 array of int, bsq format [40 bytes]
   col_2: 1x10x1 array of int, bsq format [40 bytes]
   col_3: 1x10x1 array of double, bsq format [80 bytes]
   col_4: 1x10x1 array of float, bsq format [40 bytes]
   col_5: 1x10x1 array of double, bsq format [80 bytes]
   col_6: 1x10x1 array of int, bsq format [40 bytes]
   col_7: 1x10x1 array of int, bsq format [40 bytes]
   col_8: 1x10x1 array of double, bsq format [80 bytes]
   col_9: 1x10x1 array of float, bsq format [40 bytes]
   col_10: 1x10x1 array of double, bsq format [80 bytes]
   col_11[0]: Text Buffer with 10 lines of text
       1: hello0
       2: hello0
       3: hello0
       4: hello0
       5: hello0
       6: hello0
       7: hello0
       8: hello0
       9: hello0
       10: hello0
   col_11[1]: Text Buffer with 10 lines of text
       1: hello1
       2: hello1
       3: hello1
       4: hello1
       5: hello1
       6: hello1
       7: hello1
       8: hello1
       9: hello1
       10: hello1
   col_11[2]: Text Buffer with 10 lines of text
       1: hello2
       2: hello2
       3: hello2
       4: hello2
       5: hello2
       6: hello2
       7: hello2
       8: hello2
       9: hello2
       10: hello2

  This is an example with the same file but only reading the first column and providing a name for it:
  dv> a = unpack("Ix64", "strmulttest.dat", 0, col_names="my_name")
  struct, 1 elements
    my_name: 1x10x1 array of short, bsq format [20 bytes]

  Here is an example providing a text buffer (string array) for multiple names:
  dv> names = text(5)
  Text Buffer with 5 lines of text
    1:
    2:
    3:
    4:
    5:
  dv> names[,1,] = "name1"
  "name1"
  dv> names[,2,] = "name2"
  "name2"
  dv> names[,3,] = "name3"
  "name3"
  dv> names[,4,] = "name4"
  "name4"
  dv> names[,5,] = "name5"
  "name5"
  dv> a = unpack("Iu2i3u4r4x51", "strmulttest.dat", 0, col_names=names)
  struct, 5 elements
    name1: 1x10x1 array of short, bsq format [20 bytes]
    name2: 1x10x1 array of int, bsq format [40 bytes]
    name3: 1x10x1 array of int, bsq format [40 bytes]
    name4: 1x10x1 array of double, bsq format [80 bytes]
    name5: 1x10x1 array of float, bsq format [40 bytes]

DavinciWiki Mini-Nav Bar

Contents


Contact Developers

  • davinci-dev [AT] mars.asu.edu

All other topics

  • See navigation on the left

Related Functions

Recent Core Changes

Modified On: 11-18-2016

Personal tools