Java: XML Interview questions

Describe the differences between XML and HTML.

It's amazing how many developers claim to be proficient programming with XML, yet do not understand the basic differences between XML and HTML. Anyone with a fundamental grasp of XML should be able describe some of the main differences outlined in the table below.

Differences Between XML and HTML
Table 1.

XML	HTML
User definable tags	Defined set of tags designed for web display
Content driven	Format driven
End tags required for well formed documents	End tags not required
Quotes required around attributes values	Quotes not required
Slash required in empty tags	Slash not required

Describe the role that XSL can play when dynamically generating HTML pages from a relational database.

Even if candidates have never participated in a project involving this type of architecture, they should recognize it as one of the common uses of XML. Querying a database and then formatting the result set so that it can be validated as an XML document allows developers to translate the data into an HTML table using XSLT rules. Consequently, the format of the resulting HTML table can be modified without changing the database query or application code since the document rendering logic is isolated to the XSLT rules.

Give a few examples of types of applications that can benefit from using XML.

There are literally thousands of applications that can benefit from XML technologies. The point of this question is not to have the candidate rattle off a laundry list of projects that they have worked on, but, rather, to allow the candidate to explain the rationale for choosing XML by citing a few real world examples. For instance, one appropriate answer is that XML allows content management systems to store documents independently of their format, which thereby reduces data redundancy. Another answer relates to B2B exchanges or supply chain management systems. In these instances, XML provides a mechanism for multiple companies to exchange data according to an agreed upon set of rules. A third common response involves wireless applications that require WML to render data on hand held devices.

What is DOM and how does it relate to XML?

The Document Object Model (DOM) is an interface specification maintained by the W3C DOM Workgroup that defines an application independent mechanism to access, parse, or update XML data. In simple terms it is a hierarchical model that allows developers to manipulate XML documents easily Any developer that has worked extensively with XML should be able to discuss the concept and use of DOM objects freely. Additionally, it is not unreasonable to expect advanced candidates to thoroughly understand its internal workings and be able to explain how DOM differs from an event-based interface like SAX.

What is SOAP and how does it relate to XML?

The Simple Object Access Protocol (SOAP) uses XML to define a protocol for the exchange of information in distributed computing environments. SOAP consists of three components: an envelope, a set of encoding rules, and a convention for representing remote procedure calls. Unless experience with SOAP is a direct requirement for the open position, knowing the specifics of the protocol, or how it can be used in conjunction with HTTP, is not as important as identifying it as a natural application of XML.

Can you walk us through the steps necessary to parse XML documents?

Superficially, this is a fairly basic question. However, the point is not to determine whether candidates understand the concept of a parser but rather have them walk through the process of parsing XML documents step-by-step. Determining whether a non-validating or validating parser is needed, choosing the appropriate parser, and handling errors are all important aspects to this process that should be included in the candidate's response.

Give some examples of XML DTDs or schemas that you have worked with.

Although XML does not require data to be validated against a DTD, many of the benefits of using the technology are derived from being able to validate XML documents against business or technical architecture rules. Polling for the list of DTDs that developers have worked with provides insight to their general exposure to the technology. The ideal candidate will have knowledge of several of the commonly used DTDs such as FpML, DocBook, HRML, and RDF, as well as experience designing a custom DTD for a particular project where no standard existed.

Using XSLT, how would you extract a specific attribute from an element in an XML document?

Successful candidates should recognize this as one of the most basic applications of XSLT. If they are not able to construct a reply similar to the example below, they should at least be able to identify the components necessary for this operation: xsl:template to match the appropriate XML element, xsl:value-of to select the attribute value, and the optional xsl:apply-templates to continue processing the document.

When constructing an XML DTD, how do you create an external entity reference in an attribute value?

Every interview session should have at least one trick question. Although possible when using SGML, XML DTDs don't support defining external entity references in attribute values. It's more important for the candidate to respond to this question in a logical way than than the candidate know the somewhat obscure answer.

How would you build a search engine for large volumes of XML data?

The way candidates answer this question may provide insight into their view of XML data. For those who view XML primarily as a way to denote structure for text files, a common answer is to build a full-text search and handle the data similarly to the way Internet portals handle HTML pages. Others consider XML as a standard way of transferring structured data between disparate systems. These candidates often describe some scheme of importing XML into a relational or object database and relying on the database's engine for searching. Lastly, candidates that have worked with vendors specializing in this area often say that the best way the handle this situation is to use a third party software package optimized for XML data.

Obviously, some important areas of XML technologies were not included in this list -- namespaces, XPointer, XLink, and so on -- and should be added to the interviewer's set of questions if applicable to the particular position that the candidate is applying for. However, these questions in conjunction with others to assess soft skills (communication skills, ability to work on teams, leadership ability, etc.) will help determine how well candidates understand the fundamental principles of XML.

What is the difference between SAX parser and DOM parser?

DOM parser - reads the whole XML document and returns a DOM tree representation of xml document. It provides a convenient way for reading, analyzing and manipulating XML files. It is not well suited for large xml files, as it always reads the whole file before processing.
SAX parser - works incrementally and generate events that are passed to the application. It does not generate data representation of xml content so some programming is required. However, it provides stream processing and partial processing which cannot be done alone by DOM parser.

DOM: creates an internal representation of an XML document

Nice for smaller XML files, but because of the whole XML file representation is in memory, it is possible not useful for very large documents
good for representation of an XML document.

SAX:

Event driven, so reacting when it finds certain elements in the XML
code (e.g. tags,
properties, …) goes from top to bottom, and if it encounters

e.g. begintag, it fires an event.
e.g. end tag, it fires and event.
e.g. begin of file, it fires an event

What is Xpath?

XPath is used to navigate through elements and attributes in an XML document.

What is XSL?

XSLT - a language for transforming XML documents
XSLT is used to transform an XML document into another XML document, or another type of document that is recognized by a browser, like HTML and XHTML. Normally XSLT does this by transforming each XML element into an (X)HTML element.

XPath - a language for navigating in XML documents
XSL-FO - a language for formatting XML documents

What is the difference between Schema and DTD?

A DTD is: The XML Document Type Declaration contains or points to markup declarations that provide a grammar for a class of documents. This grammar is known as a document type definition or DTD. The DTD can point to an external subset containing markup declarations, or can contain the markup declarations directly in an internal subset, or can even do both.

A Schema is: XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents. In summary, schemas are a richer and more powerful of describing information than what is possible with DTDs.

How Schemas Differ from DTDs

The first, and probably most significant, difference between XML Schemas and XML DTDs is that XML Schemas use XML document syntax. While transforming the syntax to XML doesn’t automatically improve the quality of the description, it does make those descriptions far more extensible than they were in the original DTD syntax. Declarations can have richer and more complex internal structures than declarations in DTDs, and schema designers can take advantage of XML’s containment hierarchies to add extra information where appropriate — even sophisticated information like documentation. There are a few other benefits from this approach. XML Schemas can be stored along with other XML documents in XML-oriented data stores, referenced, and even styled, using tools like XLink, XPointer, and XSL.

The largest addition XML Schemas provide to the functionality of the descriptions is a vastly improved data typing system. XML Schemas provide data-oriented data types in addition to the more document-oriented data types XML 1.0 DTDs support, making XML more suitable for data interchange applications. Built-in datatypes include strings, booleans, and time values, and the XML Schemas draft provides a mechanism for generating additional data types. Using that system, the draft provides support for all of the XML 1.0 data types (NMTOKENS, IDREFS, etc.) as well as data-specific types like decimal, integer, date, and time. Using XML Schemas, developers can build their own libraries of easily interchanged data types and use them inside schemas or across multiple schemas.

The current draft of XML Schemas also uses a very different style for declaring elements and attributes to DTDs. In addition to declaring elements and attributes individually, developers can create models — archetypes — that can be applied to multiple elements and refined if necessary. This provides a lot of the functionality SOX had developed to support object-oriented concepts like inheritance. Archetype development and refinement will probably become the mark of the high-end schema developer, much as the effective use of parameter entities was the mark of the high-end DTD developer. Archetypes should be easier to model and use consistently, however.

XML Schemas also support namespaces, a key feature of the W3C’s vision for the future of XML. While it probably wouldn’t be impossible to integrate DTDs and namespaces, the W3C has decided to move on, supporting namespaces in its newer developments and not retrofitting XML 1.0. In many cases, provided that namespace-prefixes don’t change or simply aren’t used, DTD’s can work just fine with namespaces, and should be able to interoperate with namespaces and schema processing that relies on namespaces. There will be a few cases, however, where namespaces may force developers to use the newer schemas rather than the older DTDs.

How do you parse/validate the XML document?

The only way to validate an XML file is to parse the XML document using the DOM parser or the SAX parser.

What is XML Namespace?

The XML namespaces recommendation defines a way to distinguish between duplicate element type and attribute names. Such duplication might occur, for example, in an XSLT stylesheet or in a document that contains element types and attributes from two different DTDs.

An XML namespace is a collection of element type and attribute names. The namespace is identified by a unique name, which is a URI. Thus, any element type or attribute name in an XML namespace can be uniquely identified by a two-part name: the name of its XML namespace and its local name. This two-part naming system is the only thing defined by the XML namespaces recommendation.

XML namespaces are declared with an xmlns attribute, which can associate a prefix with the namespace. The declaration is in scope for the element containing the attribute and all its descendants. For example:

abcd

If an XML namespace declaration contains a prefix, you refer to element type and attribute names in that namespace with the prefix. For example:

abcd

Can you give me an executive summary of what XML namespaces are not?

They aren’t a cure of cancer, they aren’t a way to win the lottery, and they aren’t a direct cause of world peace. They also aren’t very difficult to understand or use. Two things that XML namespaces are not have caused a lot of confusion, so we’ll mention them here:

XML namespaces are not a technology for joining XML documents that use different DTDs. Although they might be used in such a technology, they don’t provide it themselves.

The URIs used as XML namespace names are not guaranteed to point to schemas, information about the namespace, or anything else — they’re just identifiers. URIs were used simply because they’re a well-known system for creating unique identifiers. Don’t even think about trying to resolve these URIs.

What is XML template?
* A style sheets describes transformation rules
* A transformation rule: a pattern + a template
* Pattern: a configuration in the source tree
* Template: a structure to be instantiated in the result tree
* When a pattern is matched in the source tree, the corresponding pattern is generated in the result tree

Java

Thursday, June 19, 2008

XML Interview questions