September 7, 2007

3D Multimodal Interaction in Augmented and Virtual Reality

Filed under: — admin @ 10:54 am

This project focuses on exploring multimodal interaction in immersive environments, particularly on the problem of target disambiguation while selecting an object in 3D. We have created an interactive 3D environment as our test bed and have used it in a variety of augmented reality (AR) and virtual reality (VR) scenarios.Often in 3D immersive environments the user is faced with many selection problems, such as imprecise pointing at a distance, selection of occluded/hidden objects, and recognition errors (e.g., speech recognition errors). Our goal is to reduce selection errors by considering multiple input sources and trying to compensate for errors in some of these sources through the results of others. For example, if one object (e.g., a “chair”) is occluded by another object (e.g., a “desk”), simple ray-based selection will fail to select the “chair” since it cannot be seen. However, if one is able to specify for example that the object of interest is a “chair,” and it is “behind the desk,” then speech can help disambiguate the pointing gesture and yield the correct result.

In a multimodal environment, each input source (e.g., spoken language) has its associated uncertainties. In our system, these are represented as the n-best lists for each associated input source, as well as corresponding probabilities representing the actual prediction certainty. In addition to spoken language, the sources in our system include 3D gestures and a set of visibility and spatiality perceptors that use our SenseShapes approach.

Our multimodal system fuses symbolic and statistical information from these sources and employs mutual disambiguation of these modalities to improve the decision-making process. Thus, it is possible (and probable) that the top choice of each recognizer will not always be the selected one, and the the choice that provides a best fit across all available inputs will more likely be selected. User studies conducted with the system demonstrate that such mutual disambiguation corrections account for over 45% of the successful 3D multimodal interpretations.

Read more on Columbia University


September 8, 2007

Collective Intelligence: Include The Disabled for Success

Filed under: — admin @ 3:46 pm

Want to be more productive at solving complex problems in groups? Work better in teams? Utilize all of your resources at their fullest potential, no matter how different some individuals might be perceived to be? Maybe you should study the way some ants bury their dead, ways fireflies in some parts of the world light up in synchronization, or the way field honey bees fly from flower to flower, collecting pollen and sweet juices, or nectar to produce honey. Read on; I am serious!

Have you ever been in a meeting and hardly anyone talked? Maybe the few outgoing people were the only ones voicing their opinions. As you might be sitting back in the meeting and listening and thinking to yourself, my thought is not valuable because it is quite different than all the other ideas that are being brought up; so I do not speak up in fear of being different.

I’ll bet you do not know how much your different point of view helps to make the outcome better for all! Without your different perspective the complete group may fail because you followed the opinions of only a few in the group; right or wrong.

If you have a disability or other unique viewpoints on the topic in discussion, or project, or program you are working on, it makes your input even more crucial to produce the very best output possible. I might even argue that if you are disabled, or have other challenges your thoughts are more important, since others do not have your unique viewpoint to offer such help or guidance.

This is true with software or hardware development as well. You would not want to develop inferior software or hardware products that are not accessible or usable by all people. In this highly competitive global market it is best to not limit your customer base to only a portion of the world’s population allowing your competitors to gain an advantage in which you may never have the opportunity to catch up. Include people with disabilities, we are brimming with innovative ideas!

With so much room for improvements in the current approach to working together, some groups and organizations have started to look at nature for resolutions and new ideas. Nature has done well when many members interact with each other with no one person directing, like the ants, fireflies and bees I mentioned

Do you wonder how such positive collaboration can happen?
Have you ever heard of swarm, or collective intelligence?

Swarm, or collective intelligence in one definition; is interacting as one large, self-organized group of computers or groups of people with all individuals fully participating, without infrastructure limitations. This is an emergent behavior, where complex group actions arise from simple local rules.

From Stephen Strogatz: Who Cares About Fireflies? We see fantastic examples of synchrony in the natural world all around us. To give an example, there were persistent reports when the first Western travelers went to South East Asia, back to the time of Sir Francis Drake in the 1500s, of spectacular scenes along riverbanks, where thousands upon thousands of fireflies in the trees would all light up and go off simultaneously. These kinds of reports kept coming back to the West, and were published in scientific journals, and people who hadn’t seen it couldn’t believe it. Scientists said that this is a case of human misperception, that we’re seeing patterns that don’t exist, or that it’s an optical illusion. How could the fireflies, which are not very intelligent creatures, manage to coordinate their flashings in such a spectacular and vast way?

The answer on how this can happen is swarm, or collective intelligence.

In the May 1, 2001, Harvard Business Review, Swarm Intelligence: A Whole New Way to Think about Business by Eric Bonabeau and Christopher Meyer talks about the following.

What do ants and bees have to do with business? A great deal, it turns out. Individually, social insects are only minimally intelligent, and their work together is largely self-organized and unsupervised. Yet collectively they’re capable of finding highly efficient solutions to difficult problems and can adapt automatically to changing environments. Over the past 20 years, the authors and other researchers have developed rigorous mathematical models to describe this phenomenon, which has been dubbed “swarm intelligence,” and they are now applying them to business. Their research has already helped several companies develop more efficient ways to schedule factory equipment, divide tasks among workers, organize people, and even plot strategy. Emulating the way ants find the shortest path to a new food supply, for example, has led researchers at Hewlett-Packard to develop software programs that can find the most efficient way to route phone traffic over a telecommunications network. Southwest Airlines has used a similar model to efficiently route cargo. To allocate labor, honeybees appear to follow one simple but powerful rule–they seem to specialize in a particular activity unless they perceive an important need to perform another function. Using that model, researchers at Northwestern University have devised a system for painting trucks that can automatically adapt to changing conditions. In the future, the authors speculate, a company might structure its entire business using the principles of swarm intelligence. The result, they believe, would be the ultimate self-organizing enterprise–one that could adapt quickly and instinctively to fast-changing markets.

Continue to read the post on Global Dialog Center


September 9, 2007

Metadata vs. Metatags

Filed under: — admin @ 2:33 pm

There would appear to be a certain amount of confusion about the terms “metadata” and “meta tags” - I know that it has confused me in the past so I am hoping that this article may make things a little more clear for those who are struggling with these meta-things.

Metadata

As this article is being written for a newsletter focusing on accessibility, let’s start by looking at meta-things in this context. Checkpoint 13.2 of WCAG10 tells us to:

Provide metadata to add semantic information to pages and sites.

What does this mean? Let’s start by defining what we mean by metadata: metdata is data about data. So, providing metadata for our page (which is data) means that by some means we need to describe that data.

How much metadata do we need to provide about our page to satisfy Checkpoint 13.2? There is probably no one correct answer to this other than to provide as much useful metadata as possible. The minimum is to provide a value for (X)HTML’s only real metadata element - the <title></title>.

Yes, the <title></title> is metadata - it is data that describes the data that is our page or document. Please note that “Untitled” is not a good value for our <title></title>, nor is it cool or clever to use the title of the site for the <title></title> of every page in the site. (Including the site title, however, is good as it allows the page to make more sense when viewed on its own.)

Meta Tags or Meta Elements

What about “meta tags” or “meta elements” then? Are these not metadata? No; meta elements (or tags if you will) are not metadata in themselves, they are (X)HTML elements that allow us to embed metadata in a page.

Let’s look at how we can use meta elements to put metadata into a document, using a couple of well-known examples:
<html>
<title>My page</title>
<meta name=”description” content=”A page I wrote about some stuff.” />
<meta name=”keywords” content=”Smiffy, stuff, things,” />
</head>

In the example above, we are presenting three pieces of metadata - the page title, some information called “description” and some information called “keywords”. The description would give a brief summary of what the page is about and the keywords a comma separated list of terms relevant to the page. Whilst description and keywords are often published, how much they actually get used is debateable. There was a time when search engines might have taken note of these but from what I have been reading recently, they are largely ignored for the simple reason that much metadata of this type cannot be trusted. The description and keywords metadata values may be of use if your organisation has an in-house search engine that can make use of them - but there are better things available as we will see shortly.

I have seen many other terms added as metadata, for instance “author”. Great - it’s good to identify the author of a document, but information embedded in meta elements is really for machines to read, not humans. Besides, can we agree that the creator or writer of a document is called an author? Probably not.

The problem that we have here is that we are not working to a formal metadata scheme. If we make up our own terms, they will probably only be of use to us, and only then if we have our in-house search engine as mentioned earlier.
Formal Metadata Schemes

If we really want to make our metadata useful, we need to agree on what the terms are (description and keywords may be common, but they are informal). If we all say that the person who creates a document is a “creator”, then my in-house search engine can look at your documents and know who wrote them because we have both used the same term - and your in-house search engine can look at my documents and make sense of them in the same way.

Where we can get really clever is when I use a set of terms that you may not be familiar with, but I also provide you with a link to something called a schema that defines those terms. Your software can then run off and look at the schema and come back and tell you what my metadata means. This is where we start to touch on what is known as the Semantic Web.
Dublin Core: A Formal Metadata Scheme

Let’s look now at a metadata scheme called Dublin Core[2]. Some people might even call Dublin Core the formal metadata scheme as it is actually listed with the International Standards Organisation as ISO15836. There are two parts to Dublin Core, the 15 Elements which constitute ISO15836 and the Terms, which give us much more scope about what we can describe. It should be noted that Dublin Core metadata isn’t just about describing Web content - it can describe physical objects, events, services and more.

Rather than go into the boring theory of Dublin Core metadata, let’s get our hands dirty and look at a practical example; this is a selection of the Dublin Core metadata used to describe the document you are currently reading:

<link rel=”schema.DC” href=”http://purl.org/dc/elements/1.1/” />
<link rel=”schema.DCTERMS” href=”http://purl.org/dc/terms/” />
<meta name=”DC.language” scheme=”DCTERMS.RFC1766″ content=”en” />
<meta name=”DC.type” scheme=”DCTERMS.DCMIType” content=”Text” />
<meta name=”DC.format” scheme=”DCTERMS.IMT” content=”text/html; charset=UTF-8″ />
<meta name=”DC.title” lang=”en” content=”Smiffy’s Place: Metadata, Meta Tags, Meta What?” />
<meta name=”DC.creator” content=”Matthew Smith (Smiffy)” />
<meta name=”DC.identifier” content=”http://www.smiffysplace.com/metadata-meta-tags-meta-what” />
<meta name=”DCTERMS.license” content=”http://creativecommons.org/licenses/by-nc-sa/3.0/” />
<meta name=”DC.rights” content=”(C) Copyright 2001-2007 Matthew Steven Smith” />
<meta name=”DC.description” content=”Article for the GAWDS newsletter clarifying the differences

between metadata and meta tags and what the two are actually for.” />
<meta name=”DC.subject” content=”Accessibility;Dublin Core;HTML;Technical;XHTML;adaptability;metadata;namespace;scheme” />
<meta name=”DCTERMS.created” scheme=”DCTERMS.W3CDTF” content=”2007-08-14″ />

Rather than describing every element, I will point out a few items of interest - you can read up on the elements and terms at the Dublin Core site.

Firstly, those first two links - what are they there for? These links point to the schemas for the Dublin Core Elements (DC) and Terms (DCTERMS). If you don’t understand my metadata, you can follow these links to the schemas so that you can find out how they work - or at least your software can.

The first meta element, DC.language, has more than the usual name and content properties - it also has a property called “scheme”. What this means is that the value of our content is part of a formal, controlled vocabulary. This means that the only values that can appear in the content must be picked from this vocabulary, so we are all speaking the same language - no making up your own.

The element DC.title has yet another new property: lang=”en”. This means that the title that I am presenting is in English. I could have several different values of DC.title, each with a different language attribute, allowing me to present a multi-lingual version of the metadata.

Not forgetting that we started off talking about description and keywords, Dublin Core covers these. Description is DC.description and keywords becomes DC.subject. If you already have description and subject meta elements in your documents, you can easily convert them to the equivalent Dublin Core terms by a) renaming them and b) changing the commas in keywords to semicolons in DC.subject. Easy!

I would encourage readers to go through the code example above in conjunction with the documentation on the Dublin Core site.

Continue to read post by Matthew Smith for the GAWDS (Guild of Accessible Web Designers)


September 12, 2007

Flikr photo collection links added to DBpedia

Filed under: — admin @ 7:46 am

Christian Becker (Freie Universität Berlin) has implemented a wrapper around flikr which generates photo collections depicting DBpedia concepts. See flikr wrappr for details. We have interlinked all DBpedia concepts with the corresponding photo collections. You can now use any Semantic Web browser to navigate from a DBpedia concept to flikr photos depicting it by following the dbpedia:hasPictureCollection property. This means an additional 30-50 million photos are accessible through DBpedia.

Improving Wikipedia

Wikipedia is the by far largest publicly available encyclopedia on the Web. Wikipedia editions are available in over 250 languages with the English one accounting for more than 1.95 million articles. Wikipedia has the problem that its search capabilities are limited to full-text search, which only allows very limited access to this valuable knowledge-base.

Sematic Web

Semantic Web technologies enable expressive queries against structured information on the Web and to interlink data between different Web data sources. The Semantic Web has the problem that there is not much RDF data online yet and that up-to-date terms and ontologies are missing for many application domains.

DBpedia

The DBpedia project approaches both problems by extracting structured information from Wikipedia and by making this information available on the Semantic Web.

The DBpedia dataset currently provides information about more than 1.95 million “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. Altogether, the DBpedia dataset consists of 103 million pieces of information (RDF triples).

Source: DBpedia


September 13, 2007

Multimodal Interaction Design, Implementation and Evaluation

Filed under: — admin @ 11:15 pm

Spoken natural language may appeal to users in the general public since it is the main modality used, together with pointing gestures or gaze, in face-to-face human communication. Our work on multimodal human-computer interaction is based on the two following observations. On the one hand, speech- and gesture-based multimodality has been extensively studied, both from a software and an ergonomic point of view. However, speech plus graphics as an output form of multimodality has raised fewer research studies, especially regarding the utility and usability of speech as a supplementary modality to graphics.

Besides, pointing hand gestures have the same limited expressive power as gaze in some contexts of use, namely, the selection of objects on very large displays (e.g., electronic blackboards, reality centres or caves, etc.) or in 3D environments. In these interaction environments, both modalities can only specify directions, if used spontaneously as in real life. Our current work on multimodality addresses the three following issues:

  1. How to design oral messages that help visual search in cluttered displays?
  2. How to design multimodal command languages that use information on spontaneous or controlled gaze movements to disambiguate oral commands, especially those including deictic phrases?
  3. Oral assistance to visual search

Concerning the effectiveness of oral support to visual search, the detailed presentation of our first study has been accepted as a chapter in a collective scientific book edited by Kluwer [39]. This study was focused on determining whether oral information on the location of a visual target in a complex, cluttered, display could improve the efficiency of its identification (accuracy and selection times). Targets were either familiar (visual presentation of the isolated target prior to scene display) or unfamiliar (oral characterisation of the target only, prior to scene display), monomodal (visual or oral) or multimodal (visual and oral).

This initial study was followed up, last year, with two more ambitious experimental studies. The first experiment, which involved 24 participants, was focused on investigating the influence of oral help messages on the speed and accuracy of visual target detection activities. Message content merely specified the location of the target in one out of nine pre-defined areas on the screen. The effectiveness of this form of oral assistance was assessed for various display spatial layouts. 3600 photographs of real landscapes, people and objects were selected from a database including over 6000 items, then formatted and divided up into 120 thematically homogeneous collections (30 photographs per collection). These collections were displayed using four spatial layouts (40 collections/scenes per spatial layout): elliptical, radial, matrix-like, random. To refine results on participants’ performances (especially target detection accuracy and selection time), we performed a complementary experiment where the eye movements of 5 participants were traced (using an ASL-501 eye tracker) during similar visual tasks to the ones performed during the second experiment save for the presence of oral assistance. Eye movements were analysed using specific software. Results of both studies are detailed in Suzanne Kieffer’s PhD [15]; see also [19] for the second study, [27] and [28] for the third one. These results are part of our contribution to the Micromégas Project(Three year research project (starting July 2003) in collaboration with the In Situ team at UR INRIA-Futurs (Orsay) and the Laboratoire de la Perception et du Mouvement in Marseille; it benefits from national support (ACI “Masses de données”, 1st call, 2003).).

Continue to read the full research


Multimodal Interaction with a Wearable Augmented Reality System

Filed under: — admin @ 11:18 pm

Wearable computers and their novel applications demand more context-specific user interfaces than traditional desktop paradigms can offer. This article describes a multimodal interface and explains how it enhances a mobile user’s situational awareness and how it provides new functionality.

This mobile, augmented-reality system visualizes otherwise invisible information encountered in urban environments. A versatile filtering tool allows interactive display of occluded infrastructure and of dense data distributions, such as room temperature or wireless network strength, with applications for building maintenance, emergency response, and reconnaissance missions. To control this complex application functionality in the real world, the authors combine multiple input modalities–vision-based hand gesture recognition, a 1D tool, and speech recognition–with three late integration styles to provide intuitive and effective input means. The system is demonstrated in a realistic indoor and outdoor task environment, and preliminary user experiences are described. The authors postulate that novel interaction metaphors must be developed together with user interfaces that are capable of controlling them.

Source: Computers.org


Multimodal Interaction for Pedestrians

Filed under: — admin @ 11:19 pm

What are the most suitable interaction paradigms for navigational and informative tasks for pedestrians? Is there an influence of social and situational context on multimodal interaction?

A new study takes a closer look at a multimodal system on a handheld device that was recently developed as a prototype for mobile navigation assistance. The system allows visitors of a city to navigate, to get information on sights, and to use and manipulate map information. In an outdoor evaluation, we studied the usability of such a system on site. The study yields insight about how multimodality can enhance the usability of hand-held devices with their future services. The study show, for example that for our more complicated tasks multimodal interaction is superior to classical unimodal interaction.

Source: ACM


The Web Accessibility Initiative of the World Wide Web Consortium

Filed under: — admin @ 11:21 pm

Each year on the UW-Madison campus, 1,500 students and prospective applicants receive services from the McBurney Disability Resource Center. It is believed that the actual number of students with disabilities is much higher. In order to access Web-based information, some of these students must use assistive devices in place of traditional computer keyboards and mice. Some rely on screen readers (programs like JAWS or Dragon Naturally Speaking) to convert text to speech. Others require text translations of audio materials.

When we make accommodations for students with disabilities, the rest of the student body often benefits. Throughout Madison and other Wisconsin cities, sidewalk curb cut-outs benefit not only wheelchair users, but also cyclists, rollerbladers and stroller pushers. On the Internet, accessible Web pages provide meaningful content not only for people with disabilities, but also those using a wide variety of hardware and software from a wide variety of locations. For example, when we provide text transcripts for audio materials, the materials become more accessible to students with hearing impairments, students who speak English as a second language, and students using computers or devices without sound cards or speakers.

As the number of Internet-capable devices such as digital phones and personal digital assistants increases, so will demand for access to web content regardless of location or display device. The Web Accessibility Initiative of the World Wide Web Consortium provides an excellent starting point for anyone who wants or needs to learn more about creating universally accessible web pages.

The World Wide Web Consortium (W3C) is a large international organization with members from industry, government, and education. W3C members work together to develop standards for web developers and users. Through its Web Accessibility Initiative (WAI), the W3C seeks to make web content accessible to people regardless of their disabilities.

The WAI web site at is aimed at a broad audience including people who develop Web content, HTML authoring tools, and “user agents” which include software for accessing Web documents. Several knowledge levels are supported, from beginner through expert.

The WAI web site is itself a model of accessible content. It is well organized and easy to navigate within and between documents at the site. Major sections include Resources on Web Accessibility; Events, News and History; Involvement and Information; About the WAI Team; and Sponsors. If you are new to the concept of accessible Web pages, head to the section called Resources on Web Accessibility, then Easy Introductions. There you’ll find a brief slide show called Overview of the Web Accessibility Initiative. After viewing the slide show, you may wish to view Checkpoints for Web Content.

At the heart of the WAI are the guidelines and techniques which can be found at http://www.w3c.org/wai in a section called Resources on Web Accessibility. Each guideline is assigned a priority from one to three, depending on whether a Web content developer must, should, or may satisfy a particular checkpoint. Topics covered by the guidelines include best use of color and tables, equivalent alternatives to auditory and visual content, and clear navigation mechanisms. Each topic includes checkpoints and techniques to assist you in creating accessible content.

By now you may be wondering how accessible your existing Web pages are. To find out, head to the WAI web site and the section called Resources on Web Accessibility, then Reference Links. There you’ll find a link to Evaluation and Repair Tools. Follow that link, then scroll down the page until you see a link to Bobby. (Or, alternatively, go directly to Bobby here.) At the Bobby site you can enter the URL of any web page and receive almost instant feedback about its overall accessibility. With feedback from Bobby and guidelines from WAI, you can begin to create access for everyone, everywhere.

Copyright note: W3C is a registered trademark of the Massachusetts Institute of Technology, Institut National de Recherche en Informatique et en Automatique, and Keio University on behalf of the World Wide Web Consortium.

Source: UWSA


Overview of Multimodal Portlet Tooling

Filed under: — admin @ 11:22 pm

The Multimodal Toolkit adds extensions to Rational Application Developer to provide multimodal functionality, allowing the development of applications with both visual and voice user interfaces. The Toolkit provides an integrated development environment that can minimize both the skills and time needed to develop applications for XHTML+Voice aware PDAs and other handheld wireless devices.

XHTML+Voice (X+V) is a Web markup language for developing multimodal applications for wireless devices. X+V uses both voice and visual elements of user interaction. As devices become smaller, modes of interaction other than keyboard and stylus have become a necessity. In particular, devices like cell phones and PDAs serve many functions, and contain sufficient processing power to handle a variety of tasks.

Multimodal access is the ability to combine more than one mode of interaction, including speech recognition, keyboard, touch screen, and stylus. Depending on the situation and the device, a combination of input modes will make using a small device easier. For example, in a Web browser on a PDA, you can select items by tapping or by providing spoken input. Similarly, you can use voice or stylus to enter information into a field. With multimodal technology, information on the device can be both displayed and spoken.

To download Multimodal Toolkit, go to the Multimodal Toolkit download site.


September 15, 2007

Animated Portable Network Graphics

Filed under: — admin @ 9:14 pm

APNG is an extension of the PNG format, adding support for animated images. It is intended to be a replacement for simple animated images that have traditionally used the GIF format, while adding support for 24-bit images and 8-bit transparency. APNG is a simpler alternative to MNG, providing a spec suitable for the most common usage of animated images on the Internet.APNG is backwards-compatible with PNG; any PNG decoder should be able to decode the first frame of an APNG and treat it as a normal single-frame PNG.

PNG Structure

An APNG stream is a normal PNG stream as defined in the PNG Specification, with two additional chunks describing the animation and providing additional frame data. The first frame of an animation, frame 0, is encoded as a normal PNG. This frame is what decoders that do not understand the APNG chunks will display.

The size of the first frame defines the boundaries of the entire animation; hence, if extra space will be needed for later frames that is unused in the first frame, the first frame should be appropriately padded with fully transparent regions.

To be recognized as an APNG, an aCTL chunk should appear the stream before any IDAT chunks. If an aCTL appears after an IDAT chunk, a decoder must ignore it and treat the stream as a single-frame PNG. The aCTL structure is described in the next section.

A fCTL chunk must also appear before IDAT, providing frame information for the first frame encoded in the PNG stream’s IDAT chunks, known as frame 0.

Subsequent frames are encoded in aDAT chunks, containing information about placement and rendering of a frame in fCTL chunks, as well as frame data encoded as normal PNG streams. The full layout of the aDAT chunk is described in section

aCTL: The Animation Control Chunk

Note: For purposes of chunk descriptions, an “unsigned int” shall be a 32-bit unsigned integer in network byte order; a “signed int” shall be a 32-bit signed integer in network byte order; an “unsigned short” shall be a 16-bit unsigned integer in network byte order; a “byte” shall be a 8-bit unsigned integer.

The aCTL chunk is an ancillary chunk as defined in the PNG Specification. It must appear before an IDAT chunk within a valid PNG stream.

Format:

byte
0 num_iterations (unsigned int) Number of times to loop this APNG. 0 indicates infinite looping.

num_iterations indicates the number of iterations that this animation should play; if it is 0, the animation should play indefinitely. If nonzero, the animation should come to rest on the final non-skipped frame at the end of the last iteration.

aDAT: The Animation Data Chunk

The aDAT chunk contains a stream of images to be used as frames for the animation. Any aDAT chunks must follow any IDAT chunks.

Format:

byte
0 sequence_number (unsigned int) Sequence number of this aDAT chunk, starting with 0
4 remaining stream data

The sequence_number shall begin with 0 for the first aDAT in the PNG stream, and increase by one with each subsequent PNG stream. The length of the stream data is determined by the length of the aDAT chunk, minus the size of the sequence_number. If only one aDAT chunk is present in the PNG stream, all frames must be encoded within its data section. If more than one aDAT chunk is present, the first chunk (that is, the chunk with sequence_number 0) must have empty stream data to allow the decoder to check for the most common out-of-order errors by looking at the sequence_number of the next frame. All aDAT chunks must be adjacent in the PNG stream.

The stream data within the aDAT consists of a stream of PNG chunks, with each frame beginning with a fCTL chunk, followed by an IHDR, and then containing normal PNG image chunks followed by one or more IDAT chunks providing the data for the frame. No IEND is to be written for these embedded streams; instead, a fCTL chunk indicates that the frame has ended and provides control information for the next frame. For the final frame of the image, its end is indicated by an end of the aDAT data section, and a PNG chunk after the aDAT that is not another aDAT. (??? Do we want an explicit fEND or something here?)

Each frame must be of the same color type as the parent stream. Each frame’s region, as defined by its width and height (specified in the IHDR) and its x and y offset (specified in the fCTL, described below) must lie within the first image canvas.

Encoders are strongly encouraged to use a single aDAT chunk whenever possible, to remove the chance of out of order chunks. Should a decoder receive an APNG stream with missing or out of order aDAT chunks, it is under no obligation to attempt to reorder the chunks and may treat that case as an error condition.

fCTL: The Frame Control Chunk

Format:

byte
0 x_offset (unsigned int) X position at which to render this frame
4 y_offset (unsigned int) Y position at which to render this frame
8 delay_num (unsigned short) Frame delay fraction numerator
10 delay_den (unsigned short) Frame delay fraction denominator
12 render_op (byte) Type of canvas area disposal to be done after rendering this frame

The delay_num and delay_den parameters together specify a fraction indicating the delay after the current frame, in seconds. If the denominator is 0, it is to be treated as if it were 100 (that is, delay_num then specifies 1/100ths of a second). If the the value of the numerator is 0 the decoder should render the next frame as quickly as possible, though viewers may impose a reasonable lower bound on the delay.

The frame is rendered within the region defined by the x_offset and y_offset from the fCTL, and the width and height from this frame’s IHDR. For frame 0, the x_offset and y_offset fields must be 0. Should parts of the region fall outside the canvas defined by frame 0, rendering is to be clipped to that canvas.

The render_op parameter specifies contains flags describing how the frame is to be disposed before rendering the next frame; it also specifies whether the frame is to be alpha blended into the current canvas content, or whether it should completely replace its region in the canvas. Valid render_op flags are:

bit
+——————————-+
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+————-|—|—|—|—|-+
| | +—+—+—— bits 0-2: dispose_op
| |
| +—————— bit 3: APNG_RENDER_OP_BLEND_FLAG
|
+———————- bit 4: APNG_RENDER_OP_SKIP_FRAME

Bits 5 through 7 are reserved and must be set to 0. Valid values for dispose_op are:

value:
0 APNG_RENDER_OP_DISPOSE_NONE
1 APNG_RENDER_OP_DISPOSE_BACKGROUND
2 APNG_RENDER_OP_DISPOSE_PREVIOUS

* APNG_RENDER_OP_DISPOSE_NONE: no disposal is done on this frame before rendering the next; its contents are left on the canvas. This is the default.
* APNG_RENDER_OP_DISPOSE_BACKGROUND: the frame’s region is to be cleared to the background color. If no bKGD chunk is specified, the result is fully transparent black (r, g, b, and a all 0).
* APNG_RENDER_OP_DISPOSE_PREVIOUS: the frame’s region is to be reverted to the previous contents.

APNG_RENDER_OP_BLEND_FLAG may be added to any of the above disposal operations. If this flag is not set, all color components of the frame, including alpha, overwrite the current contents of the frame’s canvas region. If the BLEND_FLAG is set the frame should be composited onto the canvas based on its alpha, using a simple OVER operation:

Csrc - component of pixel in the frame
Asrc - alpha component of a pixel in the frame
Cdst - component of pixel in the canvas region
C - resulting component value to be rendered in the canvas region

C = Csrc * Asrc + Cdst * (1 - Asrc)

Continue reading this post on Vlad1.com


Next Page »

 
Indelv.com is for sale!
 
ERP systemen
Alle ERP-systemen op een rij, compleet met ERP-nieuws en ERP-software informatie.
www.ERPcentraal.nl
ERP systemen
Alle ERP-systemen op een rij.
www.erpmatrix.nl


Quick Links
Our Friends
Cool Places
Visit also
About Us