<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:data="http://www.datamech.com/storage"> <xsd:element name="hexDump" type="xsd:hexBinary"> <xsd:annotation> <xsd:appinfo> <data:format dataCounter="EOF"/> </xsd:appinfo> </xsd:annotation> </xsd:element> </xsd:schema>Storage format can also be stored as an foreign attribute as in this example.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:data="http://www.datamech.com/storage"> <xsd:element name="hexDump" type="xsd:hexBinary" data:dataCounter="EOF"/> </xsd:schema>The two are considered equivalent. They can be mixed, even in the same element. Writing as appinfo involves more work, however the resulting schema may be accepted by more XML tools. Actually at this point the annotation code has been in use for a while. The foreign attribute code is just written and may have more bugs. So it is possible that there are cases where it work as appinfo but not foreign attribute.
All the commands names are subjected to change. We are still in the design phase. Not all commands are defined yet. Once we have a full set of commands, then we can decide the names with the full picture in mind. Also we will try to borrow names from standards such as DFDL when they are functionally equivalent.
<data:format byteOrder="littleEndian|bigEndian"/>
When the datatype is byte, short, int, long, unsignedByte, unsignedShort, unsignedInt or unsignedLong, they will be stored as 1, 2, 4 or 8 byte binary. Float and double are also supported. The only format command is the byteOrder. You have byteOrder="littleEndian" or byteOrder="bigEndian", the default being littleEndian. Chances are that you want to store the byteOrder in the schema default. It is unlikely you have mixed endianness throughout the document, but it is permitted.<data:format printf="string%optionsnumber.numberfstring"/>
There are so many ways to store decimal that we cannot hope to do them all. We just do the most simple case, store the decimal as a simple string. However, there are still so many way they can be written as. You have left or right justification, space or zero for leading zeros, leading space or plus sign for positive number. So you need a lot of commands to specified the exact way the decimal number is written as.<data:format booleanTrue="\1|string"/> booleanFalse="\0|string"/>
Boolean value are normally stored as "\0" or "\1", but you may have stored it as some tag, such as "0" or "1", "F" or "T", "no" or "yes", anything you like as long as it is not something ambiguous like "a" or "aa".<data:format boolean01="false|true"/>
In XML, boolean true can be expressed as "true" or "1", false as "false" or "0". Normally it is written out as false/true. If you want to save space you can set boolean01="true" and get 0 and 1 instead.<data:format bitField="false|lowBitFirst|highBitFirst"/>
A boolean is normally stored as as one byte (or more if you change booleanTrue), however it is possible to store 8 booleans in one byte. There are two ways of doing this:lowBitFirst, true => 0x01, false true => 0x02, false false true => 0x04 etc. highBitFirst, true => 0x80, false true => 0x40, false false true => 0x20 etc.However, even if you declare using bitField, it will not always be used. Consider an element of type boolean occurs multiple times, then we normally can use bitField. However if the element is nillable, then we really have three states, false, true or nil. Then we cannot use bitField.
Schema of sequence of boolean elements
<xsd:sequence>
<xsd:element name="elm1" type="xsd:boolean"/>
<xsd:element name="elm2" type="xsd:boolean"/>
<xsd:element name="elm3" type="xsd:boolean"/>
<xsd:element name="elm4" type="xsd:boolean"/>
<xsd:element name="elm5" type="xsd:boolean"/>
</xsd:sequence>
In theory we can use bifField, but this is not implemented this version.
Element state in the purchaseOrder schema is fixed length
<xsd:element name="state">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:length value="2"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
Debugging output:
<state>CA</state> according to data 0x4341 ("CA")
<data:format fieldWidth="number|xsd:maxLength"/>
If the string is variable length, but we want to use a fixed number of bytes to store the string. Then we can use fieldWidth to specified the number of bytes. While it is possible to use a number, it make sense to use the facet maxLength to do the specification since you should be a normal part of the schema.
City element in the purchaseOrder schema has fixed field width and no dataCounter
<xsd:element name="city" data:dataCounter="" fieldWidth="xsd:maxLength">;
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="20"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
Debugging output:
<city>Mill Valley</city> according to data 0x4D696C6C2056616C6C6579202020202020202020 ("Mill Valley ")
<data:format dataCounter = "|\0|EOF|terminator string|xsd:unsignedByte|xsd:unsignedShort|xsd:unsignedInt|xsd:nonNegativeInteger"/>
Earlier we already discussed that dataCounter="" means that there is no counter or terminator.
PartNum attribute in the purchaseOrder schema just use the default dataCounter
<xsd:attribute use="required" name="partNum" type="SKU">
Debugging output:
partNum="872-AA" according to data 0x3837322D414100 ("872-AA`")
If the terminator is EOF, then the string would go all the way to the end of file. Or you
can pick your own terminator such as "\r\n".
Element name in the purchaseOrder schema has a 1 byte counter in the front
<xsd:element name="name" type="xsd:string" data:dataCounter="xsd:unsignedByte"/>
Debugging output:
<name>Alice Smith</name> according to data 0x0B416C69636520536D697468 ("`Alice Smith")
<data:format dataCounterDigits="number|xsd:maxLength"/>
We can also have a counter with datatype nonNegativeInteger. Then we need to know the number of digits. You can specified the number of digits using dataCounterDigits, or dataCounterDigits can be "xsd:maxLength". It does not mean that the number of digits is equal to maxLength of the string. Rather the number of digits is the minimum number of digits that can accommodate maxLength. So if maxLength is 125, number of digits is 3.
Element street in the purchaseOrder schema has a two digits counter in the front
<xsd:element name="street" data:choiceTag="S" data:dataCounter="xsd:nonNegativeInteger" data:dataCounterDigits="xsd:maxLength">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="40"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
Debugging output:
<street>123 Maple Street</street> according to data 0x3136313233204D61706C6520537472656574 ("16123 Maple Street")
In the last few examples, the space taken depend the length of the string. However, often
the space taken is the same regardless the length of the string, especially when
you are writing a whole struct out to the file. We can do this by
specifying both fieldWidth and dataCounter. For example, the MacOS datatype Str255
always take up 256 bytes, the first byte is the length of the string so a maximum
of 255 characters can be accommodated. It can be specified as
fieldWidth="255" dataCounter="xsd:unsignedByte".
Printf command with extra printout before and after
<xs:element name="name" type="xs:string" data:printf="[%s]"/>
Debugging output:
<name>homes</name> according to data 0x5B686F6D65735D0D0A ("[homes]``")
<data:format nil="\0|string" notNil="\1|string"/>
If an element is nillable, then we need a flag to indicate whether it is nil or has value. This command let you specify the flag for nil element and non-nil element. The nil flags use the default.
Element billTo in the purchaseOrder schema is nillable
<xsd:element name="billTo" type="USAddress" nillable="true"/>
Debugging output:
billTo not nil according to data 0x01 ("`")
<data:format optionAbsent="\0|string" optionPresent="\1|string"/>
An attribute can be optional. An element with minOccurs="0" maxOccurs="1" is also considered optional. Then we need a flag to indicate the optional entity is absent or present. The optionAbsent and optionPresent let you set the value of the flag.
Attribute orderData and element shipDate in the purchaseOrder schema are optional
<xsd:annotation>
<xsd:appinfo>
<data:format optionPresent="P" optionAbsent="A"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:attribute name="orderDate" type="xsd:date"/>
<xsd:element minOccurs="0" name="shipDate" type="xsd:date"/>
Debugging output:
optional @orderDate present according to data 0x50 ("P")
orderDate="1999-10-20" according to data 0x313939392D31302D323000 ("1999-10-20`")
optional shipDate absent according to data 0x41 ("A")
<data:format itemCounter="xsd:unsignedByte|xsd:unsignedShort|xsd:unsignedInt|xsd:nonNegativeInteger|EOF|terminator string|?=look ahead string|?!look ahead string"/>
<data:format itemCounterDigits="number|xsd:maxOccurs"/>
When minOccurs is equal to maxOccurs, we know how many times an element repeats. However if they are not equal and maxOccurs > 1, we need to keep a count of the repeat count. The situation is similar to length of string. We can have a counter in front or a terminator at the back, or use EOF as terminator. And as we discussed earlier, itemCounter="EOF" is never inherited and will be bypassed in the inheritance hierarchy. While we can use dataCounter for both purpose, it would not be a good idea. The reason is that when we have a string, we do not have child element because we do not support mixed content. So we do not have to worry about any element inheriting the command. This is not true for elements. When we specified a counter, we may need to respecified the counter for all child elements. That is why we have separate counters for data and item. Changing the item counter would not affect the inheritance of the data counter.<data:csv="false|true"/>
<data:csvTerminator="\r\n|string"/>
<data:csvSeparatorChar=",|char"/>
<data:csvQuoteChar=""|char"/>
<data:csvEscapeChar="none|char"/>
Comma separated value is a commonly used file format. Usually when csv files are translated into XML, the fields in each record would correspond to a sequence in the XML schema. These delimited values does not mix very well with rest of the storage formats discussed in this document. However we cannot afford to ignore it because csv is so common. We need to think more about the issues and may revise it in future.777227878,Simi? D Roy,123000.00 convert it to XML <employee> <ssn>777227878</ssn> <name> <fName>Simi D</fName> <lName>Roy</lName> </name> <salary>123000</salary> </employee> convert XML back to binary and get 777227878,"""Simi D"" Roy",123000.00The name looks completely different but it is really logically equivalent. At first I think this is wrong because it fails round-trip fidelity. After working on it for a while I now think this is really the right thing to do.
777227878 Simi? D Roy 123000 777227878 "Simi D" Roy 123000"""Simi D"" Roy" may look bad to the untrained eyes, but it shows up just fine in programs.
<data:format choiceTag="string"/>
When there is a choice, the binary data need to identify which choice is in the data. Each choice would have a choice tag and it would occur immediately before the data of the chosen element. The choice tag is never inherited since it does not make sense at all. By inheritance all tags would have the same value and it is not possible to distinguish between them. However, there is a default value for the choice tag. The first choice would have a default tag of '\0', the second one would be '\1' etc. We only have limited support for choice. Each choice must be a different element. If the particle in the choice is a sequence, identifying the choice in XML data can be tricky and would not be supported in this version. Here is an example of using choice.
In the address of purchaseOrder schema, we can choose between street or poBox
<xsd:choice>
<xsd:element name="street" data:choiceTag="S" data:dataCounter="xsd:nonNegativeInteger" data:dataCounterDigits="xsd:maxLength">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="40"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="poBox" type="xsd:int" data:choiceTag="P"/>;
</xsd:choice>
Debugging output:
Choice is poBox according to data 0x50 ("P")
<poBox>5354</poBox> according to data 0xEA140000 ("````")
The all compositor is similar to the choice compositor in the sense that
the particles also need a choice tag. This will tell you the order the
elements are encountered.
The schema allows comment and browseable to appear in any order:
<xs:all>
<xs:element name="comment" type="xs:string" data:choiceTag=" comment="/>
<xs:element name="browseable" minOccurs="0" type="xs:string" data:choiceTag=" browseable="/>
</xs:all>
Debugging output:
Item in all is comment according to data 0x20636F6D6D656E743D (" comment=")
<comment>All Printers</comment> according to data 0x416C6C205072696E746572730D0A ("All Printers``")
Item in all is browseable according to data 0x2062726F77736561626C653D (" browseable=")
<browseable>no</browseable> according to data 0x6E6F0D0A ("no``")