Exforsys

Home arrow Technical Training arrow C Tutorials

C Special Characters

Page 2 of 5
Author: Exforsys Inc.     Published on: 2nd Mar 2006    |   Last Updated on: 6th Jul 2011

C Programming - Constants and Identifiers

Special Characters

You have already seen the escape sequences for specifying a character through its octal or hexadecimal value. Here are some other escape sequences you can use.

If you want a character to represent the single quote you must use the escape sequence backslash followed by the single quote:

Sample Code
  1. char single_quote = '\'';
Copyright exforsys.com


Ads

To represent the backslash character itself use two backslashes:

Sample Code
  1. char backslash = '\\';
Copyright exforsys.com


The question mark character can be specified as an escape sequence, but it does not need to be; it can be printed as just a plain character:

Sample Code
  1. char question1 = '?';
  2. char question2 = '\?';
Copyright exforsys.com


These lines will set both question1 and question2 to the question mark character.

There is a special escape sequence backslash followed by lowercase a that is used for alerts. This is used to get the user's attention. How exactly this is done depends on the environment where the program is run. It could produce a sound or some visual notification.

Sample Code
  1. char alert = 'a';
  2. printf( "%cThere is a problem with the input file.\n", alert );
Copyright exforsys.com


On this system the above code produces a flash of the output window and the line is printed in the window. On a different system the alert character may be handled differently.

There are other escape sequences that have to do with moving the cursor on the screen:

Escape Sequence Function

Escape sequence

Function

\n

New line.  Moves the cursor one line down and to the beginning of the line.

\t

Horizontal tab.  Moves the cursor to the next tab position of the output device.

\b

Backspace.  Moves the cursor one position back on the same line.

\f

Form feed.  Moves the cursor to the next "page" of the output device.

\r

Carriage return.  Moves the cursor to the beginning of the current line.

\v

Vertical tab.  Moves the cursor to the next vertical tab position of the output device.

As you can see, these escape sequences have functions that seem to apply more to old typewriters than modern displays. That shows the age of the C language. When it was created, many display systems had more in common with typewriters than today's GUI interfaces. From the list above, you will probably use the new line and horizontal tab sequences the most. The others are rarely used, and exactly how they work depends on the display device.

Another way to specify a character is to use its universal character code defined in the ISO/IEC 10646 standard. This standard tries to give a unique number to every character of every language in the world. In the example above, the letter Ă is represented by the hexadecimal number 0x0102 (which is 258 in decimal and 402 in octal). That number is from the ISO standard.

You can specify the universal character code by using the backslash lowercase u followed by 4 hexadecimal digits, or backslash uppercase U followed by 8 hexadecimal digits. This is similar to the hexadecimal escape sequence except that the hexadecimal sequence can be arbitrarily long; the \u or \U escape sequences are strictly 4 or 8 digits, respectively. We can specify the Ă character these ways:

Sample Code
  1. wchar a4 = L'\u0102';
  2. wchar a5 = L'\U00000102';
Copyright exforsys.com


These lines define a4 and a5 as the Ă character just as in the example above.

Trigraphs and Digraphs

This section will talk briefly about trigraphs and digraphs. You will probably never see these in real C code anymore, but you should be aware of what they are.

Trigraphs are sequences of 3 characters that stand for another character. Digraphs are sequences of two characters that stand for another character. They were employed when the system used to enter the program source code did not have a way to enter certain special characters like [, | or }. When the system lacked a | character, you could use ??! instead, and the compiler would treat that sequence of characters as a single | character. Here is a table showing the trigraphs and digraphs that C allows:

Sequence

Meaning

??=

#

??(

[

??/

\

??)

]

??'

^

??<

{

??!

|

??>

}

??-

~

<:

[

:>

]

<%

{

%>

}

%:

#

%:%:

##


Here is a simple program using trigraphs and digraphs:

Sample Code
  1.    %:include <stdio.h>
  2.    %:include <string.h>
  3.    
  4.    void main()
  5.    <%
  6.            char hello??(??) = "Hello World!";
  7.            int i;
  8.    
  9.            for ( i = 0; i < strlen(hello); ++i )
  10.           ??<
  11.                   printf("??! %c ", hello??( i ??));
  12.           ??>
  13.           printf("??\n");
  14.  
  15.   %>
Copyright exforsys.com


Output of program:

| H | e | l | l | o | | W | o | r | l | d | ! |

Ads

When this is compiled the compiler will recognize the trigraph and digraph sequences and will treat them as if the actual characters they replace were in the source. Note that the substitutions happen in the whole source file, even inside the string literals on lines 11 and 13.



 
This tutorial is part of a C Tutorials tutorial series. Read it from the beginning and learn yourself.

C Tutorials

 

Comments