Main Page | Alphabetical index | English Encyclopedia

Data compression

From Wikipedia, the free encyclopedia.
In computer science, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than a more obvious representation would use, thanks to specific encoding schemes. For example, this article could be encoded with fewer bits if we accept the convention that the word "compression" is encoded as "CP!".

As is the case with any form of communication, compressed data communication only works when both the sender and receiver of the information understand the encoding scheme. For example, this text makes sense only if the receiver understands that it is intended to be interpreted as characters representing the English language. Similarly, compressed data can only be understood if the decoding method is known by the receiver.

One popular encoding scheme that many computer users are familiar with is the ZIP file format. It can be used to reduce the size of an attachment to an e-mail message, facilitating its easier transmission or storage.

Compression is possible because most real-world data are very statistically redundant. When represented in its human-interpretable form (or in the case of text to be printed on a computer screen, a simple machine-interpretable form such as ASCII), the data are represented in a non-concise way. For example, the letter 'e' is much more common in English text than the letter 'z', and the likelihood of the letter 'q' being followed by the letter 'z' is rather remote. Analysis of these statistical behaviors can allow the same information to be represented much more concisely.

Further compression is possible if some loss of fidelity is allowable. For example, a person viewing a picture or television video scene might not notice if some of its finest details are removed or not represented perfectly. Similarly, two strings of samples representing an audio recording may sound the same but actually not be exactly the same under detailed computer analysis. Specialized signal processing techniques can take advantage of allowing relatively minor differences in order to enable representing the picture, video, or audio using fewer bits.

Compression is important because it helps reduce the consumption of expensive resources, such as disk space or connection bandwidth. However, compression requires information processing power, which can also be expensive. The design of data compression schemes therefore involves trade-offs between various factors including compression capability, any amount of introduced distortion, computational resource requirements, and often other considerations as well.

Some schemes are reversible so that the original data can be reconstructed (lossless data compression), while others accept some loss of data in order to achieve higher compression (lossy data compression).

Table of contents
1 Applications
2 Theory
3 See also
4 References
5 External links

Applications

One very simple means of compression, for example, is run-length encoding, wherein large runs of consecutive identical data values are replaced by a simple code with the data value and length of the run. This is an example of lossless data compression. It is often used to better use disk space on office computers, or better use the connection bandwidth in a computer network. For symbolic data such as spreadsheets, text, executable programs, etc., losslessness is essential because changing even a single bit cannot be tolerated (except in some limited cases).

For visual and audio data, some loss of quality can be tolerated without losing the essential nature of the data. By taking advantage of limitations of the human sensory system, a great deal of space can be saved while producing output which is nearly indistinguishable from the original. These lossy data compression methods typically offer a three-way tradeoff between compression speed, compressed data size and quality loss.

Lossy image compression is used in digital cameras, greatly reducing their storage requirements while hardly degrading picture quality at all. Similarly, DVDs use the lossy MPEG-2 codec for video compression.

In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less audible) components of the signal. Compression of human speech is often performed with even more specialized techniques, so that "speech compression" or "voice coding" is sometimes distinguished as a separate discipline than "audio compression". Different audio and speech compression standards are listed under audio codecs. Voice compression is used in internet telephony for example, while audio compression is used for CD ripping and is decoded by MP3 players.

Theory

The theoretical background of compression is provided by information theory (which is closely related to algorithmic information theory) and by rate-distortion theory. These fields of study were essentially created by Claude Shannon, who published fundamental papers on the topic in the late 1940s and early 1950s. Cryptography and coding theory are also closely related. The idea of data compression is deeply connected with statistical inference and particularly with the maximum likelihood principle.

Many lossless data compression systems can be viewed in terms of a four-stage model. Lossy data compression systems typically include even more stages, including for example, prediction, frequency transformation, and quantization.

The Lempel-Ziv (LZ) compression methods are the most popular algorithms for lossless storage. DEFLATE is a variation on LZ which is optimized for decompression speed and compression ratio, although compression can be slow. DEFLATE is used in PKZIP, gzip and PNG. LZW (Lempel-Ziv-Welch) was patented by Unisys until June of 2003, and is used in GIF images. Also noteworthy are the LZR (LZ-Renau) methods, which serve as the basis of the Zip method. LZ methods utilize a table based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded (e.g. SHRI, LZX). A current LZ based coding scheme that performs well is LZX, used in Microsoft's CAB format.

See also

Data compression topics

Compression algorithms

Lossless data compression

Lossy data compression

References

External links



Limit search to: Body and Title Deutsche Seiten Path

Websites for Data
Showing page 1 (1 - 10 of 36333 hits) Next »
... W3C XQuery 1.0 and XPath 2.0 Data Model, which is the data model of at least XSLT and XQuery, and any other specifications that reference it. This data model is based on the data models of XPath and XML Query Data Model and replaces XML Query Data Model. ( ...
... facility for preserving and distributing oceanographic and marine data. Staff have direct experience of marine data collection and analysis to ensure that the biological, chemical, physical and geophysical data handled are documented and stored for current and ... View information on current and past projects, search data and cruise inventories, download and request oceanographic data and order data products. UK national facility ...
... Dych and Evan Levy. The search for better data management practices has led to data stewardship and data governance efforts, but confusion over roles and disconnects ... in cooperation. The next wave will bring customer data integration and master data management initiatives that promise to relieve business ...
By integrating the modeling environment, Rose Data Modeler maps the object and data models, tracking changes across business, application and data models in a way that traditional data modeling tools cannot. By integrating the modeling environment, Rose Data Modeler maps the object and data models, ...
Consulting firm specializing in requirements analysis, data modeling, process modeling, data warehousing, architecture and other data related issues. Also sells Data Model Patterns, a packaged data model. Consulting firm specializing in requirements analysis, ...
The German Oceanographic Data Centre serves as a focal point for the national and international exchange of oceanographic data. Provides access to the Marine Environmental Database (MUDAB), cruise schedules and data inventories. Marine data may be requested online. The German Oceanographic Data Centre serves as a focal point for ...
a data system intended to allow researchers transparent access to oceanographic data, stored in any of several different file formats ... the Internet. Using DODS function libraries, many existing data analysis programs can be easily modified to accommodate ... DODS includes a protocol for the transmission of data across the Internet, and supports selection of data using constraint expressions, and translation of data ...
... on fundamental physical constants, atomic spectra, molecular spectroscopic data, ionization data, X-Ray and Gamma-Ray data, radiation dosimetry data, nuclear physics data, and condensed matter physics data. Database holding ...
Open standard data quality and data migration framework with focus on large data sets, reference data management and legacy systems. Open standard data quality and data migration framework with focus ...
Provider of data quality assessment, data profiling and data monitoring products. Provider of data quality assessment, data profiling and data monitoring products.

Next »

Help build the largest human-edited directory on the web.
Submit a Site - Open Directory Project - Become an Editor
Free thumbnail preview by Thumbshots.org

Search for products at amazon.com:
Search:
Keywords:
amazon.com books on 'Data compression':
Search at Google.com:
Google
WebCalSky.com Encyclopedia

Im Artikel erwähnte Literatur