Exforsys.com
 
Home Certification SCJP
 

SCJP 5 : Chapter 3. API Contents (Part-5)

 

SCJP 5 : Chapter 3. API Contents (Part-5)

Write code that uses standard J2SE APIs in the java.util and java.util.regex packages to format or parse strings or streams. For strings, write code that uses the Pattern and Matcher classes and the String.split(...) method. Recognize and use regular expression patterns for matching (limited to: . (dot), * (star), + (plus), ?, \d, \s, \w, [], ()). The use of *, +, and ? will be limited to greedy quantifiers, and the parenthesis operator will only be used as a grouping mechanism, not for capturing content during matching. For streams, write code using the Formatter and Scanner classes and the PrintWriter.format/printf methods. Recognize and use formatting parameters (limited to: %b, %c, %d, %f, %s) in format strings.


The Java 2 Platform, Standard Edition (J2SE), version 1.4, contains a new package called java.util.regex, enabling the use of regular expressions. Now functionality includes the use of meta characters, which gives regular expressions versatility.


A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern.


A typical invocation sequence is thus:


Pattern p = Pattern.compile("a*b");
Matcher m = p.matcher("aaaaab");
boolean b = m.matches();


A matches method is defined by Pattern class as a convenience for when a regular expression is used just once. This method compiles an expression and matches an input sequence against it in a single invocation. The statement:


boolean b = Pattern.matches("a*b", "aaaaab");


is equivalent to the three statements above, though for repeated matches it is less efficient since it does not allow the compiled pattern to be reused.


Instances of Pattern class are immutable and are safe for use by multiple concurrent threads. Instances of the Matcher class are not safe for such use.


Character classes

[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)


Predefined character classes

. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]


Pattern Class

An instance of the Pattern class represents a regular expression that is specified in string form in a syntax similar to that used by Perl.


A regular expression, specified as a string, must first be compiled into an instance of the Pattern class. The resulting pattern is used to create a Matcher object that matches arbitrary character sequences against the regular expression. Many matchers can share the same pattern because it is stateless.


The compile method compiles the given regular expression into a pattern, then the matcher method creates a matcher that will match the given input against this pattern. The pattern method returns the regular expression from which this pattern was compiled.

The split method is a convenience method that splits the given input sequence around matches of this pattern. The following example uses split to break up a string of input separated by commas and/or whitespace:


.....
import java.util.regex.*;

public class Splitter {
.....public static void main(String[] args) throws Exception {
..........// Create a pattern to match breaks
..........Pattern p = Pattern.compile("[,\\s]+");
..........// Split input with the pattern
..........String[] result = p.split("one,two, three four , five");
..........for (int i=0; i < result.length; i++) {
...............System.out.println("|" + result[i] + "|");
..........}
.....}
}

.....



The output:


|one|
|two|
|three|
|four|
|five|


Matcher Class

Instances of the Matcher class are used to match character sequences against a given string sequence pattern. Input is provided to matchers using the CharSequence interface to support matching against characters from a wide variety of input sources.


A matcher is created from a pattern by invoking the pattern's matcher method. Once created, a matcher can be used to perform three different kinds of match operations:


  • The matches method attempts to match the entire input sequence against the pattern.
  • The lookingAt method attempts to match the input sequence, starting at the beginning, against the pattern.
  • The find method scans the input sequence looking for the next sequence that matches the pattern.

Each of these methods returns a boolean indicating success or failure. More information about a successful match can be obtained by querying the state of the matcher.


The Matcher class also defines methods for replacing matched sequences by new strings whose contents can, if desired, be computed from the match result.


The appendReplacement method appends everything up to the next match and the replacement for that match. The appendTail appends the strings at the end, after the last match.


The following code samples demonstrate the use of the java.util.regex package. This code writes "One dog, two dogs in the yard" to the standard-output stream:


.....
import java.util.regex.*;

public class Replacement {
.....public static void main(String[] args) throws Exception {
..........// Create a pattern to match cat
..........Pattern p = Pattern.compile("cat");
..........// Create a matcher with an input string
..........Matcher m = p.matcher("One cat, two cats in the yard");
..........StringBuffer sb = new StringBuffer();
..........boolean result = m.find();
..........// Loop through and create a new String with the replacements
..........while(result) {
...............m.appendReplacement(sb, "dog");
...............result = m.find();
..........}
..........// Add the last segment of input to the new String
..........m.appendTail(sb);
..........System.out.println(sb.toString());
.....}
}

.....



Quantifiers

Quantifiers specify the number of occurrences of a pattern. This allows us to control how many times a pattern occurs in a string. Table summarizes how to use quantifiers:


Table 3.1. Quantifiers


Greedy Quantifiers


Reluctant Quantifiers


Possessive Quantifiers


Occurrence of a pattern X


X?


X??


X?+


X, once or not at all


X*


X*?


X*+


X, zero or more times


X+


X+?


X++


X, one or more times


X{n}


X{n}?


X{n}+


X, exactly n times


X{n,}


X{n,}?


X{n,}+


X, at least n times


X{n,m}


X{n,m}?


X{n,m}+


X, at least n but not more than m times



The first three columns show regular expressions that represent a set of strings in which X loops occur. The last column describes the meaning of its corresponding regular expressions. There are three types of quantifiers to specify each kind of pattern occurrence. These three types of quantifiers are different in usage. It's important to understand the meaning of the metacharacters used in quantifiers before we explain the differences.


The most general quantifier is {n,m}, where n and m are integers. X{n,m} means a set of strings in which X loops at least n times but no more than m times. For instance, X{3, 5} includes XXX, XXXX, and XXXXX but excludes X, XX, and XXXXXX.


Even though we have the above metacharacters to control occurrence, there are several other ways to match a string with a regular expression. This is why there is a greedy quantifier, reluctant quantifier, and possessive quantifier in each case of occurrence.


A greedy quantifier forces a Matcher to digest the whole inputted string first. If the matching fails, it then forces the Matcher to back off the inputted string by one character, check matching, and repeat the process until there are no more characters left.


A reluctant quantifier, on the other hand, asks a Matcher to digest the first character of the whole inputted string first. If the matching fails, it appends its successive character and checks again. It repeats the process until the Matcher digests the whole inputted string.


A possessive quantifier, unlike the other two, makes a Matcher digest the whole string and then stop.


Table below helps to understand the difference between the greedy quantifier (the first test), the reluctant quantifier (the second test), and the possessive quantifier (the third test). The string content is "whellowwwwwwhellowwwwww"


Table 3.2. Difference between quantifiers


Regular Expression


Result


.*hello


Found the text "whellowwwwwwhello" starting at index 0 and ending at index 17.


.*?hello


Found the text "whello" starting at index 0 and ending at index 6. Found the text "wwwwwwhello" starting at index 6 and ending at index 17.


.*+hello


No match found.



Capturing groups

The above operations also work on groups of characters by using capturing groups. A capturing group is a way to treat a group of characters as a single unit. For instance, (java) is a capturing group, where java is a unit of characters. javajava can belong to a regular expression of (java)*. A part of the inputted string that matches a capturing group will be saved and then recalled by back references.


Java provides numbering to identify capturing groups in a regular expression. They are numbered by counting their opening parentheses from left to right. For example, there are four following capturing groups in the regular expression ((A)(B(C))):


1. ((A)(B(C)))
2. (A)
3. (B(C))
4. (C)


You can invoke the Matcher method groupCount() to determine how many capturing groups there are in a Matcher's Pattern.


The numbering of capturing groups is necessary to recall a stored part of a string by back references. A back reference is invoked by \n, where n is the index of a subgroup to recall the capturing group.


Table 3.3. Groups usage


Whole Content


Regular Expression


Result


abab


([a-z][a-z])\1


Found the text "abab" starting at index 0 and ending at index 4.


abcd


([a-z][a-z])\1


No match found.


abcd


([a-z][a-z])


Found the text "ab" starting at index 0 and ending at index 2. I found the text "cd" starting at index 2 and ending at index 4.



String.split()

J2SE 1.4 added the split() method to the String class to simplify the task of breaking a string into substrings, or tokens. This method uses the concept of a regular expression to specify the delimiters. A regular expression is a remnant from the Unix grep tool ("grep" meaning "general regular expression parser").


See most any introductory Unix text or the Java API documentation for the java.util.regex.Pattern class.


In its simplest form, searching for a regular expression consisting of a single character finds a match of that character. For example, the character 'x' is a match for the regular expression "x".


The split() method takes a parameter giving the regular expression to use as a delimiter and returns a String array containing the tokens so delimited. Using split() function:


.....
String str = "This is a string object";
String[] words = str.split (" ");
for (String word : words) {
.....out.println (word);
}

.....



The output:


This
is
a
string
object


NOTE, str.split (" "); is equal to str.split (\\s);.


To use "*" (which is a "special" regex character) as a delimiter, specify "\\*" as the regular expression (escape it):


.....
String str = "A*bunch*of*stars";
String[] starwords = str.split ("\\*");

.....



A
bunch
of
stars


NOTE, always use double "\" for escaping in java source code, i.e. "\\s", "\\d", "\\*", otherwise the code will not compile:


.....
String str = "boo and foo";
str.split("\s"); // WRONG ! Compilation error !

.....



.....
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
.....Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )

.....at regex.Replacement.main(Replacement.java:15)

.....



The following example (splitting by single character):


.....
String str = "My1Daddy2cooks34pudding";
String[] words = str.split ("d");
for (String word : words) {
.....System.out.println (word);
}

.....



gives the following output:


My1Da


y2cooks34pu


ing


The same string, but with escaped "d" (regexp):


.....
String str = "My1Daddy2cooks34pudding";
String[] words = str.split ("\\d"); // NOT "d"
for (String word : words) {
.....System.out.println (word);
}

.....



The output:


My
Daddy
cooks


pudding


public String[] split(String regex)


Splits this string around matches of the given regular expression. This method works as if by invoking the two-argument split(...) method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array. The string "boo:and:foo", for example, yields the following results with these expressions:


.....
String str = "boo:and:foo";
System.out.println(Arrays.toString(str.split(":")));
System.out.println(Arrays.toString(str.split("o")));

.....



The output:


.....
[boo, and, foo]
[b, , :and:f]

.....



.....
public String[] split(String regex, int limit)
.....



Splits this string around matches of the given regular expression.


The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.


The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded:


.....
String str = "boo:and:foo";
.....System.out.println("a. " + Arrays.toString(str.split(":", 2)));
.....System.out.println("b. " + Arrays.toString(str.split(":", 5)));
.....System.out.println("c. " + Arrays.toString(str.split(":", -2)));
.....System.out.println("d. " + Arrays.toString(str.split("o", 5)));
.....System.out.println("e. " + Arrays.toString(str.split("o", -2)));
.....System.out.println("f. " + Arrays.toString(str.split("o", 0)));

..........



An invocation of this method of the form str.split(regex, n) yields the same result as the expression:


Pattern.compile(regex).split(str, n)


Formatted input

The scanner API provides basic input functionality for reading data from the system console or any data stream. The following example reads a String from standard input and expects a following int value:


.....
Scanner s= new Scanner(System.in);
String param= s.next();
int value=s.nextInt();
s.close();

.....



The Scanner methods like next and nextInt will block if no data is available. If you need to process more complex input, then there are also pattern-matching algorithms, available from the java.util.Formatter class.


java.util.Scanner is a simple text scanner which can parse primitive types and strings using regular expressions.


A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.


For example, this code allows a user to read a number from System.in:


.....
Scanner sc = new Scanner(System.in);
int i = sc.nextInt();

.....



As another example, this code allows long types to be assigned from entries in a file myNumbers:


.....
Scanner sc = new Scanner(new File("myNumbers"));
while (sc.hasNextLong()) {
.....long aLong = sc.nextLong();
}

.....



The scanner can also use delimiters other than whitespace. This example reads several items from a string:


.....
String input = "1 fish 2 fish red fish blue fish";
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.nextInt());
System.out.println(s.nextInt());
System.out.println(s.next());
System.out.println(s.next());
s.close();

.....



prints the following output:


1
2
red
blue


The same output can be generated with this code, which uses a regular expression to parse all four tokens at once:


.....
String input = "1 fish 2 fish red fish blue fish";
Scanner s = new Scanner(input);
s.findInLine("(\\d+) fish (\\d+) fish (\\w+) fish (\\w+)");
MatchResult result = s.match();
for (int i=1; i < = result.groupCount(); i++) {
.....System.out.println(result.group(i));
}
s.close();

.....



Class java.util.Scanner implements a simple text scanner (lexical analyzer) which uses regular expressions to parse primitive types and strings from its source.


A Scanner converts the input from its source into tokens using a delimiter pattern, which by default matches whitespace.


The tokens can be converted into values of different types using the various next() methods:


.....
Scanner scanner = new Scanner(System.in); // Connected to standard input.
int i = scanner.nextInt();

.....



.....
Scanner scanner = new Scanner(new File("myLongNumbers")); (1) Construct a scanner.
while (scanner.hasNextLong()) { // (2) End of input? May block.
.....long aLong = scanner.nextLong(); // (3) Deal with the current token. May block.
}
scanner.close(); // (4) Closes the scanner. May close the source.

.....



Before parsing the next token with a particular next() method, for example at (3), a lookahead can be performed by the corresponding hasNext() method as shown at (2).


The next() and hasNext() methods and their primitive-type companion methods (such as nextInt() and hasNextInt()) first skip any input that matches the delimiter pattern, and then attempt to return the next token.


Constructing a Scanner

A scanner must be constructed to parse text:


Scanner(Type source)


Returns an appropriate scanner. Type can be a String, a File, an InputStream, a ReadableByteChannel, or a Readable (implemented by CharBuffer and various Readers).


Scanning

A scanner throws an InputMismatchException when the next token cannot be translated into a valid value of requested type.


Lookahead methods:


.....
// returns true if this scanner has another token in its input
boolean hasNext()

// returns true if the next token matches the specified pattern
boolean hasNext(Pattern pattern)

// returns true if the next token matches the pattern constructed
// from the specified string
boolean hasNext(String pattern)

// returns true if the next token in this scanner's input can be interpreted as an
// numeric type value corresponding to 'XXX' in the default or specified
// radix
boolean hasNextXXX()
boolean hasNextXXX(int radix)

// returns true if the next token in this scanner's
// input can be interpreted as a boolean value using
// a case insensitive pattern created from the string
// "true|false"
boolean hasNextBoolean()

.....



The name XXX can be: Byte, Short, Int, Long, Float, Double or BigInteger.


Parsing the next token methods:


.....
// scans and returns the next complete token from this scanner
String next()

// returns the next string in the input that matches the specified pattern
String next(Pattern pattern)

// returns the next token if it matches the pattern constructed from the specified string
String next(String pattern)

// scans the next token of the input as a 'xxx' value corresponding to 'XXX'
xxx nextXXX()
xxx nextXXX(int radix)

// scans the next token of the input into a boolean
// value and returns that value
boolean nextBoolean()

// advances this scanner past the current line and
// returns the input that was skipped
String nextLine()

.....



The name XXX can be: Byte, Short, Int, Long, Float, Double or BigInteger. The corresponding 'xxx' can be: byte, short, int, long, float, double or BigInteger.


Example:


.....
String input = "123 45,56 TRUE 567 722 blabla";
Scanner scanner = new Scanner(input);
out.println(scanner.hasNextInt());
out.println(scanner.nextInt());
out.println(scanner.hasNextDouble());
out.println(scanner.nextDouble());
out.println(scanner.hasNextBoolean());
out.println(scanner.nextBoolean());
out.println(scanner.hasNextInt());
out.println(scanner.nextInt());
out.println(scanner.hasNextLong());
out.println(scanner.nextLong());
out.println(scanner.hasNext());
out.println(scanner.next());
out.println(scanner.hasNext());
scanner.close();

.....



The output:


true
123
true
45.56
true
true
true
567
true
722
true
blabla
false


Error in parsing:


.....
String input = "123,123";
Scanner scanner = new Scanner(input);
out.println(scanner.hasNextInt());
out.println(scanner.nextInt());
scanner.close();

.....



The output (runtime exception):


.....
false
Exception in thread "main" java.util.InputMismatchException
.....at java.util.Scanner.throwFor(Unknown Source)
.....at java.util.Scanner.next(Unknown Source)
........

.....



Formatted output

Developers now have the option of using printf-type functionality to generate formatted output. This will help migrate legacy C applications, as the same text layout can be preserved with little or no change.


Most of the common C printf formatters are available, and in addition some Java classes like Date and BigInteger also have formatting rules. See the java.util.Formatter class for more information. Although the standard UNIX newline '\n' character is accepted, for cross-platform support of newlines the Java %n is recommended. Furthermore, J2SE 5.0 added a printf() method to the PrintStream class. So now you can use System.out.printf() to send formatted numerical output to the console. It uses a java.util.Formatter object internally:


.....
System.out.printf("name count%n");
System.out.printf("%s %5d%n", user,total);

.....



The simplest of the overloaded versions of the method goes as


.....
printf (String format, Object... args)
.....



The format argument is a string in which you embed specifier substrings that indicate how the arguments appear in the output. For example:


.....
double pi = Math.PI;
System.out.printf ("1. pi = %5.3f %n", pi);
System.out.printf ("2. pi = %f %n", pi);
System.out.printf ("3. pi = %b %n", pi);
System.out.printf ("4. pi = %s %n", pi);

.....



results in the console output:


1. pi = 3,142
2. pi = 3,1415





 

 

Comments



Post Your Comment:

Members Please Login
Your Name:*
e-mail ID:(required for notification)*
Image Verification: 
 
 Subscribe    

Sponsored Links

 

Subscribe via RSS


Get Daily Updates via Subscribe to Exforsys Free Training via email


Get Latest Free Training Updates delivered directly to your Inbox...

Enter your email address:


 

Subscribe to Exforsys Free Training via RSS
 

 
Partners -  Privacy and Legal Policy -  Site News -  Contact   Sitemap  

Copyright © 2000 - 2010 exforsys.com. All Rights Reserved

Page copy protected against web site content infringement by Copyscape