Site Links:

Home / Index
Qt Programming and Hints
Understanding IPv6
My Technical Projects

Understanding Barcodes

Barcodes are not the most "hip" or most "innovative" solution when it comes to tagging products, identifying tickets, or otherwise providing machine readable information, but they are undoubtedly the most prolific and probably the most robust solution. While RFID, NFC, contact chips, and holograms are much "hipper" and supposedly more secure - none of them can reach the price point of barcode: virtually free except for a bit of space to print on. While all the other technologies need new and expensive equipment barcodes can be created with any printer, as pictures they can be included in any printed material, and they can be read with a virtually unlimited number of specialized reader hardware or any smartphone with a camera if the right App is installed.

EAN all zeroes Barcodes, as the name suggests, are normally a pattern of black and white bars that encode some information. On the left you see an example of a kind of barcode that you probably use without even realizing it several dozen times every time you go shopping: the EAN (International Article Number) - every product that you can buy in a chain store carries one of those, even the receipt that you get for returning empty bottles usually carries an EAN that encodes how much money you will get back from the shop. There are several different kinds of barcode schemes each one with its specific purpose, capabilities and problems.

Below I will give a short overview over some common barcode types, how to use them and how to read them. I will not explain in detail how each code is generated, since Wikipedia is already doing an outstanding job at this - instead I will concentrate on some common patterns that you need to consider for all barcodes.

Creating Barcodes

Creating barcodes is a simple task of creating a black-and-white picture. There are several possibilities of going about this:

Use one of the hundreds of available libraries, like GNU barcode or one of the commercial solutions.
Create your own vector image of the barcode and map it onto whatever surface you want to draw on.
Create your own pixel image of the barcode and map it onto whatever surface you want to draw on.

If you want to use a library for reading barcodes you will probably already have the code to generate them in the same library - use it. At least so you know you generate barcodes that you can read as well. Use the other two options if using a library is no option (e.g. you need to reduce dependencies on 3rd-party code for some reason).

If you target plotting applications then vector images may be necessary to consider - you'll have to calculate the thickness of each bar and its position, since painting bit patterns directly is not exactly an option. Generally, I recommend against it.

For most applications chose a simple type of barcode that fits your application and implement a bit mapping algorithm to paint it.

Most classic barcodes (1D: the kind with actual bars, the 2D modern matrix codes are more complex) use fixed width patterns that represent symbols. The width is fixed both in the number of bars and the physical width of those patterns per symbol. For example in EAN each symbol is represented by two white and two black bars that together are as wide as seven thin bars - with each bar being either single, double or triple width. This means that symbols can be represented by simple bit patterns - if "0" means white and "1" means black then the pattern "0001101" represents a decimal zero on the left side of an EAN code. Some codes have distinct patterns that are used to represent the start and/or end of the code - in EAN these special codes are the thin black double lines on the left, right, and in the middle of the barcode which are normally painted longer than the rest of the barcode. Those special patterns can also have a different width - in EAN they are only made up of single-width lines (two black and one white at the sides and three whites in the middle).

Most programming environments that can handle graphics also have functions to scale bitmap images. So a very common technique is to create an image that can just hold the barcode's width and is one pixel high. The scaling algorithm of the graphics library stretches those pixels into bars of appropriate width and length. There is one pitfall though: make sure to switch anti-aliasing off when you scale the image data - otherwise the bars can get blurred and are more difficult to read afterwards.

Readers

Barcode readers come in all shapes and sizes: from small hand held devices, through readers embedded in desks to readers integrated into large machines, self-contained expensive devices that operate independently from any external computer, and mobile camera-phones with a barcode reader app installed. The main distinguishing factors are how they communicate with computers, what kind of sensor they use and how expensive they are.

There are four combinations of illumination and scanning sensor in widespread use:

the simple LED illumination with a simple CCD line can read 1D codes from a fixed distance with a fixed orientation: the code has to be in focus and has to have the exact orientation of the CCD line, those are the cheapest devices normally costing only between 20 and 100 Euros
Laser scanners scan several lines and often several orientations very quickly and project the illuminated line back onto a light sensor - the 1D code can be on any of the scanned lines/orientations - those are the devices usually built into shop desks, since they are more reliable and much more flexible than the hand held devices; those cost a bit more (typically between 100 and 300 Euros)
full 2D CCD devices normally illuminate their visual field with a LED and are a bit more tolerant than the simple 1D scanners; they are in the same price range as laser scanners
and finally mobile phones using the normal photo sensor of the camera built into the phone; matching apps can often be had for free

Apart from these simple devices there is an almost infinite number of industrial and "value-added" kinds of readers that are adapted to reading specific (non-paper) surfaces or add functions beyond simple reading (like storing and counting scans for inventory inspections or comparing barcode information on a train ticket with online passenger data). Those are price upwards of an imaginary number representing the added value of the device.

Barcode Scanner Simple hand held scanners (like the one shown in the picture) or desk mounted devices usually connect to the computer using PS/2 or USB - in both cases emulating a keyboard. The PS/2 variants usually have two connectors at the end of the cable: one to connect the real keyboard and one to connect to the PC - the normal keyboard signals being routed through the device and its own scan results added, so that the computer thinks it is just one keyboard. The USB variants just connect to any free USB port and announce themselves as just another USB keyboard. In both cases the manufacturer provides a booklet full of special barcodes that can be used to configure the devices virtual keyboard layout (whether it emulates US, German, French or whatever local layout you are using), its "typing speed" (in case programs get confused if they receive key presses too fast) and various other options.

These devices are incredibly easy to integrate into software: just provide some way to enter the barcode with the keyboard (e.g. some text edit widget). Most devices transmit a barcode as key presses of letters and digits followed by some configurable "end-of-code" sign like the return key or tab. The added bonus of them being that no additional effort is needed to enable the user to enter the barcode manually if automatic reading fails.

Older hand held devices and many industrial devices use a simple RS232 serial connector. Depending on the device the protocol can be as simple as transmitting the scanned code followed by a newline up to some complex negotiation between computer and device with additional information (like scan quality) for some industrial devices. Often drivers exist to map those serial signals back onto key presses to support simpler applications.

Some expensive "value-added" devices are also able to use various network protocols. Which protocols are used largely depends on the device and the software installed on it.

Each of those devices have their pros and cons. For most applications it will be desirable to pipe the input into the normal keyboard handling, since this represents the easiest way of using the scanner and it enables applications to use them which do not have specialized interfaces. For some specialized applications it may be more convenient to use serial or network transfers - but those will invariably be tied to a few scanners that follow a supported protocol.

Simple hand held scanners are very cheap to procure and easy to handle, but they tend to be quite slow in scanning codes and exhibit all the possible problems when scanning codes (like a very limited field of view). Those are often the first choice for the actual application developers, since they provide good estimates of barcode limitations and for any application that does not need particular speed (like scanners in a public library).

Laser scanners are a good choice for shopping environments and other environments that need high speed, but only use 1D codes (e.g. scanning theater tickets at the entrance). They are robust, easy to handle and only slightly more expensive than hand held devices.

More expensive devices are used if the material scanned is tricky to handle (reflective surfaces need special illumination), a higher scan success rate is needed (industrial scanners inside a production line provide better CCDs and algorithms), or if the scanner has to be mobile (inventory scanners or other mobile applications).

One thing to keep in mind is that barcode scanners are as fallible as any other technology. It is quite normal for a cheap scanner to be unable to scan 5% of all codes and have trouble with 10% of all codes. There are several factors influencing reading speed and quality:

the quality of the CCD sensor has a major influence on reading quality - generally the more expensive the basic device the better it is at reading its target medium because it will contain a better CCD sensor- however it always is a good idea to have a backup plan for failed readings
the devices light source also has a major influence on reading results: with cheaper readers you do not have much of a choice, you just have to hope the color, intensity and position match - with industrial devices you may be able to change those parameters
dust can seriously impact reading quality - in most readers there is a protecting glass or plastic pane between the sensor and the barcode - if dust collects on that pane the reader will become more and more unreliable - cleaning the pane helps, especially in dusty environments, but make sure you do not create micro-scratches on it (a soft cloth like you would use it for your eyeglasses is best)
the contrast of the print: the better the contrast the easier it is to read - best is black on white; colored paper or colored print can seriously deteriorate the result (e.g. under red light red print is indistinguishable from white paper and green paper is indistinguishable from black print); patterned paper also tends to make reading harder
the reflectivity of the paper and print: the more reflective the surface the easier it is to blind the sensor with reflected light - you usually get better reader results with a matted surface
ambient light: not enough light makes it difficult to read if the scanner itself has no light source; too much ambient light can blind the sensor
wrinkles in the paper can make barcodes completely unreadable - they tend to distort the pattern
likewise bending the paper can distort the pattern enough to make it difficult to read - sometimes bending in another direction can help the reading process though
the code used also influences reading quality in the way it is encoded: the easier it is to distinguish thin and thick bars and the wider apart (encoding wise) the coded symbols are the easier it is to read it - as a rule of thumb: the less information is encoded in a given amount of bars the easier it is to retrieve that information

One of the less obvious options to optimize reading quality is to switch off barcode types that you do not plan to use - the less options the scanner device has to try to find out what kind of code it is scanning, the easier it is to distinguish the remaining ones.

Types of Barcodes

There are several dozen types of barcode in widespread use, some of them standardized, some of them just common. This section lists just a handful of barcode types that are useful for most applications.

EAN

The International (originally: European) Article Number is the archetype of a barcode - it is instantly recognized as such, even by people who couldn't tell you what it could possible be used for, and it is one of the most widely deployed barcode types. Originally started as a European project to unify and streamline cash registers it has grown to subsume dozens of applications:

It is used throughout the western civilization as a product code, sometimes under different names - like JAN (Japanese Article Number in Japan).
As a product code EAN includes and replaces UPC (Universal Product Code).
It includes special codes for ISBN (International Standard Book Number) for books, ISSN (International Standard Serial Number) for magazines, and ISMN (International Sheet Music Number) for printed musical scores.
It includes encodings for coupons, refund receipts (e.g. for returned bottles), and shop internal markers.

Due to its semantics and licensing EAN is limited to shopping applications - it would be impractical to use it for example for serial numbers on devices - it's too easy to accidentally scan that serial number at a cash register.

EAN is a quite robust code that comes in four variants: EAN-13, EAN-8, EAN-5, and EAN-2 - all with their specific applications. EAN-13 is the normal kind of EAN that you will find on most products, it also includes the special encodings for ISBNs. As the name might suggest, EAN-13 encodes 13 decimal digits (the last one being a check digit). EAN-8 is a shortened version that is used when space on the product package is scarce - for example the wrappers of sweets - using this shortened code with only 8 digits the barcode is still readable at the same distance, but significantly smaller than a full EAN-13. EAN-5 and -2 are used to provide additional data next to an EAN-13. You often see an EAN-5 next to the ISBN of a book where it encodes the recommended retail price of the book. EAN-2 is often printed on periodicals (like magazines) to encode the edition - e.g. an "03" for the march edition or "12" for the december edition.

As already mentioned above EAN encodes decimal digits. The meaning of those digits is standardized. With EAN-13 and EAN-8 the first three digits always encode a country or special application - e.g. codes starting with 899 encode products produces in Indonesia, codes starting with 978 are from "bookland" which is to say they represent ISBNs. The next three to eight digits encode the producer company, the last digit is a check digit that serves to validate that the code has been read correctly, while the remaining digits encode a company internal product number. All 13 digits together must uniquely identify the product.

In EAN-5 the first digit encodes the currency (1 for British Pounds, 5 for US $, 0 can stand for British Pounds or the local currency of the target market), the remaining four digits encode the price in cents or pennies.

EAN-2 has no specific encoding, it is usually just counted up starting from "01" for the first in a series.

There is just a handful of applications that will need to print EAN:

You need to replace EANs that have become unreadable.
You are a producer of retail goods and need to generate barcodes for the stores.
You need to label your own internal products for your own store to make selling them easier (e.g. some chains have special in-house products that are not sold anywhere else or you normally do not sell to end customers except in your factory store). There are special "country" codes (e.g. "020" - "029") for this purpose.
You want to label older books with their ISBN (use the old 10-digit ISBN, remove the last digit, prepend 978 and calculate a new EAN check digit).

Other than the ones listed above there are almost no legitimate reasons to use EAN - it is too easy to scan them into a cash register. The upside of this is: if you have a reason to use EAN it is a very robust code with good reading quality.

The Wikipedia page linked below shows in detail how to construct EAN codes.

Code-39

This barcode type is aptly named after the amount of symbols it originally encoded: letters (it cannot differentiate upper and lower case), digits and three special characters - in sum 39 usable character symbols. Later it was extended to 42 symbols. Each symbol consists of 5 black and 4 white bars plus an intervening thin white space. An additional 43'rd symbol is reserved as start and stop symbol which appears first and last in the code - the direction of that symbol tells the scanner the direction the code is facing. Thick bars are three times as wide as thin bars.

All this makes the code rather robust and relatively versatile, but also has the drawback of encoding only very few symbols in a relatively large space.

There are several variants of the code:

the most widely used variant today encodes 42 different symbols and uses the last symbol in the barcode as a checksum
it is possible to switch the checksum checking off, which is not recommended
the older encoding with 39 symbols just does not use some of the special characters
there is a special encoding that makes full ASCII available at the price of two Code-39 symbols per character, this further reduces the amount of characters that can be encoded in a given space

Realistically you can encode about 10-15 usable characters with Code39 and stay compatible with most scanners. Code39 is the encoding of choice if you need high reliability and only need to encode relatively little information (like a short inventory number.

Interleave 2 of 5

Interleave 2 of 5 is a relatively dense encoding for decimal digits. Each set of 5 bars and 5 spaces encodes one digit in the black bars and one in the white spaces, with two of the bars and two of the spaces being 2.5-3 times as wide as the others (hence 2 of 5). This means that it can encode only even amounts of digits. Optionally the last digit can be used as a check digit in which case it is calculated the same way as for EAN/UPC.

The upside of this encoding is that it can encode a lot of digits in a relatively narrow space and still be comparatively easy to decode. In practice this code is prone to reading only part of the barcode due to the start and stop codes happening in many combinations of digits. If used the code should always be used with a fixed amount of digits.

Code-128

Code-128 is one of the most widely used barcodes, replacing many legacy codes that have been used earlier. It is able to encode the full ASCII-128 character set if necessary and uses only a comparatively small amount of space. The downside of this is of course that this code is relatively prone to failed reading and corruption.

This barcode is widely used to provide additional information on product packages, like extended serial numbers, component types, tracking service incidents. In other words anything that requires more data than the other codes above can store and information that is either only needed for a very limited time or information that makes life easier if immediately available but can be retrieved from databases if necessary.

Sometimes this code type is used for applications that have higher wear on the print medium (like theater tickets), but results are usually not very encouraging - usually it is a better tradeoff to use shorter data and more robust codes.

The encoding of Code-128 seems complex at first: it provides 107 symbols which can have different meanings. There are three encodings: Code-128A, -128B, and -128C. Code-128A encodes ASCII codes 0 (NUL) - 95 (underscore), Code-128B encodes ASCII codes 32 (space) - 127, with the printable characters being mapped onto the same bar patterns in both. Code-128C encodes two decimal digits per symbol. To tell the reader which encoding is used Code-128 provides three different start codes (one for each encoding) and special symbols that switch between those three encodings. Each symbol has three bars and three spaces, except for the stop symbol which uses four bars and three spaces. Each bar and space can have three different thicknesses - all of this makes the code difficult to read if it is bent or scratched.

2D Codes

Most 2D codes are not really barcodes, since they use 2-dimensional matrices of dots instead of bars, but they share the idea and origin of barcodes: to encode data in an easy to read printable pattern.

Code-49 The simplest type are simply extensions of existing barcodes that shorten the bars and batch several code blocks on top of each other. You can see an example of such a code (Code-49 encoding "Wikipedia") on the left.

Most 2D codes use a rectangular matrix of dots aligned at a 90 degree angle. Data-Matrix and the QR-Code explained below fall into this category. There are a few more exotic codes that use hexagonal or other alignments for the dots.

There were some experiments with 3D codes that use one more property for encoding data: for example the color of the dots. These are no longer in any significant use, since color codes have the bad habit of fading and the reader equipment tends to be rather expensive if it is supposed to work reliably.

QR Code

QR Code with Wikipedia link QR Code has made barcodes and in particular 2D codes quite popular recently - long after many industry pundits had decided that barcode is dead and NFC or other even more expensive gadgets are the way to go.

Originally QR Code was designed for industrial purposes: it is able to store considerably more data than EAN, Code-39 or even Code-128 in a comparative amount of space and with similar reliability. The latter is reached through the fact that QR Code uses two dimensions to store data (the second dimension is largely wasted in 1D codes) and through the liberal use of error correction codes.

Today QR Codes are used to store URLs that any mobile phone can scan and immediately open - a boon to the marketing industry and a constant worry to parents of teenagers with expensive tastes. Especially since those URLs can redirect the phone directly into its App-Store.

distorted QR Codes But thanks to its origin QR Code can be used to store arbitrary data: there are encoding variants for numeric data, alphanumeric, raw bytes, Kanji, and more - allowing it to very effectively store the right kind of data. The storage type can be switched in the middle of the code. Different sizes ("versions" in QR Code slang) can store very small (5 characters) to relatively large (174 chars with version 10, even over 1600 chars with version 40) amounts of data. Its error correction can be scaled from a relatively measly 7% (a resilience comparable to Code-128) to 30% (high resilience) redundancy. In fact it has become a kind of sport to use the high error correction ability of QR Code to insert pictures into those codes to make them look more attractive (see to the left).

The convention is to store data in URI-style: application:data - this way it can be properly routed to the right mobile application on a mobile phone or tablet. For example QR Codes starting with "market:" are usually routed to the app store application on a phone, while "http:" routes to the browser. But this is by no means mandatory - it just makes life easier for mobile application developers.

A word on property rights on QR Code: QR Code is an official ISO standard (ISO/IEC18004). Denso Wave owns several patents on QR Code, but has decided not to exercise those patents. They also own the trademark "QR Code" (you will probably be fine if you do not use it out of context and credit them properly).

Security

Barcodes and other data storage methods (e.g. NFC tags) are often used as security tokens to identify and/or authorize users, machines, or products to access machines or areas - for example scanning barcodes is often used as part of granting entrance to company grounds. So, just one warning: barcodes are no more secure than the readable copy of the data that is usually printed below them.

Example Code

You can find an example Javascript page that creates EAN by clicking here. It creates the barcode as an inline graphic. Note: this script does not work with Internet Explorer, due to some missing features in its Javascript implementation.