java.text.NumberFormat failures in numbers parsing

It often happens an application has to read decimal numbers/amounts in a locale dependent format. This is an old and standard issue. A simplest solution (in Java world) is to use a NumberFormat instance for a given locale:

Locale loc = ...;   // given locale
String input = ...; // an input from user

NumberFormat numFormat = NumberFormat.getNumberInstance(loc);

try {
    Number num = numFormat.parse(input);
    System.out.println("Parsed number: " + num);
}
catch (ParseException e) {
    System.out.println(input + " detected as invalid: " + e.getMessage());
}

Please note that you’re indirectly using NumberFormat in JSF as well when <h:inputText> without converter attribute is used and mapped to a field of number type. In such case a default Converter implementation is used to parse text input from a form. And all Converter implementations from JSF (package: javax.faces.convert) are using NumberFormat.

The issue seems to be solved. But have a look how different inputs are handled by NumberFormat (numbers on the right are presented in “C” locale):

Using locale: English (United Kingdom)
1.0 parsed as: 1.0
1,0 parsed as: 10
1.00 parsed as: 1.00
1,00 parsed as: 100
1.01 parsed as: 1.01
1,01 parsed as: 101
1,0a parsed as: 10
1..0 parsed as: 1
1.,0 parsed as: 1
1..0..5ww parsed as: 1
1,,0..6zz parsed as: 10

Using locale: Danish (Denmark)
1.0 parsed as: 10
1,0 parsed as: 1.0
1.00 parsed as: 100
1,00 parsed as: 1.00
1.01 parsed as: 101
1,01 parsed as: 1.01
1,0a parsed as: 1.0
1..0 parsed as: 10
1.,0 parsed as: 1.0
1..0..5ww parsed as: 105
1,,0..6zz parsed as: 1

Using locale: Polish (Poland)
1.0 parsed as: 1
1,0 parsed as: 1.0
1.00 parsed as: 1
1,00 parsed as: 1.00
1.01 parsed as: 1
1,01 parsed as: 1.01
1,0a parsed as: 1.0
1..0 parsed as: 1
1.,0 parsed as: 1
1..0..5ww parsed as: 1
1,,0..6zz parsed as: 1

Problems with NumberFormat that I see (as a pedantic developer):

  1. digit groups separator are ignored no matter if they are properly used
  2. parsing stops when first invalid character is met and the result is what was read up to this point

As a result many invalid inputs are parsed without any error. Invalid input is detected only when the input text starts with an invalid character. Any misuse of decimal point or digits group separator are not detected. I thing this is unacceptable!

You can check problems with NumberFormat class yourself with my simple program:

import java.math.BigDecimal;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.text.ParseException;
import java.util.Locale;

public class NumberFormatTesting {

    public static void main(String[] args) {
		testForLocale("GBR");
		testForLocale("DNK");
		testForLocale("POL");
    }
	
	private static void testForLocale(String countryCode3) {
        Locale loc = findLocaleByCode3(countryCode3);
        System.out.println("Using locale: " + loc.getDisplayName());

        NumberFormat formatter = NumberFormat.getNumberInstance(loc);

        // try to better parse BigDecimal numbers
        if (formatter instanceof DecimalFormat) {
            ((DecimalFormat)formatter).setParseBigDecimal(true);
        }

        // will invalid numbers be detected?
        parse(formatter, "1.0");
        parse(formatter, "1,0");
        parse(formatter, "1.00");
        parse(formatter, "1,00");		
        parse(formatter, "1.01");		
        parse(formatter, "1,01");
        parse(formatter, "1,0a");
        parse(formatter, "1..0");
        parse(formatter, "1.,0");
        parse(formatter, "1..0..5ww");
		parse(formatter, "1,,0..6zz");

		System.out.println();
	}

    private static void parse(NumberFormat formatter, String input) {
        Number num;
        try {
            num = formatter.parse(input);
        }
        catch (ParseException e) {
            System.out.println(input + " detected as invalid: " + e.getMessage());
            return;
        }

        System.out.println(input + " parsed as: " + num);
    }

    private static Locale findLocaleByCode3(String code) {
        for (Locale locale : Locale.getAvailableLocales()) {
            if (code.equals(locale.getISO3Country())) {
                return locale;
            }
        }
        throw new RuntimeException("Cannot find locale: " + code);
    }
}

Looks like others found this problem as well:

Inadequate attempts to solve

  1. An article from 2006 (published at IBM developerWorks) gives a simple advice how to use a NumberFormat class in a slightly better way by using NumberFormat.parse(java.lang.String, java.text.ParsePosition) method and checking if an entire input text was consumed for number parsing. Yet this simple check will not detect many invalid inputs.
  2. Apache Commons-beanutils has the BigDecimalConverter class which almost does the job. In my tests (same test cases as above) it detected all invalid input for Polish locale but accepted some bad inputs for English and Danish locales. Moreover it doesn’t preserve the scale of input (in sense of BigDecimal class).
  3. IBM ICU has replacement classes for NumberFormat and DecimalFormat but they’re exactly same bad as standard Java classes. Very dissapointing.
  4. GWT has its own NumberFormat class but… I was unable to use it in a non-GWT, simple Java application due to some mysterious runtime error:
    java.lang.UnsupportedOperationException: ERROR: GWT.create() is only usable in client code! It cannot be called, for example, from server code. If you are running a unit test, check that your test case extends GWTTestCase and that GWT.create() is not called from within an initializer or constructor.
    Otherwise this library is really huge as a way for solving such a problem.

Solution

It looks like one have to prepare his own solution like mine below. It is based on constructing a regular expression for numbers according to given locale. Valid numbers are then normalized so can be passed as argument to the constructor of BigDecimal.

import java.math.BigDecimal;
import java.text.DecimalFormat;
import java.text.DecimalFormatSymbols;
import java.text.NumberFormat;
import java.util.Locale;
import java.util.regex.Pattern;


public class BigDecimalParser {
	
	private final Pattern pattern;
	private final String groupSep;
	private final String decimalSep;
	
	public BigDecimalParser(Locale loc) {
		DecimalFormatSymbols symbols = DecimalFormatSymbols.getInstance(loc);
		int grouppingSize = findGrouppingSize(loc);
		groupSep = "" + symbols.getGroupingSeparator();
		decimalSep = "" + symbols.getDecimalSeparator();
		
		StringBuilder sb = new StringBuilder();
		sb.append(symbols.getMinusSign()).append('?');
		if (grouppingSize > 1) {
			sb.append("(([1-9]\\d{0,").append(grouppingSize - 1);
			sb.append("})(\\").append(groupSep).append("\\d{");
			sb.append(grouppingSize).append("})*|([1-9]\\d*)|0)");
		}
		else {
			sb.append("(([1-9]\\d*)|0)");
		}
		sb.append("(\\").append(decimalSep).append("\\d+)?");
		pattern = Pattern.compile(sb.toString());
	}

	public BigDecimal parse(String s) {
        if (s == null || (s = s.trim()).isEmpty()) {
            throw new RuntimeException("Input is empty");
        }
        if (!pattern.matcher(s).matches()) {
            throw new RuntimeException("Invalid number: " + s);
        }
		s = s.replace(groupSep, "");	// remove groupping separator
		s = s.replace(decimalSep, "."); // convert decimal separator
		
        try {
            return new BigDecimal(s);
        }
        catch (NumberFormatException nfe) {
            throw new RuntimeException("Number parsing error", nfe);
        }		
	}

	private int findGrouppingSize(Locale loc) {
		NumberFormat f = NumberFormat.getNumberInstance(loc);
		if (f instanceof DecimalFormat) {
			return ((DecimalFormat)f).getGroupingSize();
		}
		return 0;
	}
}

And here is my test output:

Using locale: English (United Kingdom)
0 parsed as: 0
0,1 detected as invalid: Invalid number: 0,1
0.1 parsed as: 0.1
1 parsed as: 1
1.0 parsed as: 1.0
1,0 detected as invalid: Invalid number: 1,0
1.00 parsed as: 1.00
1,00 detected as invalid: Invalid number: 1,00
1.01 parsed as: 1.01
1,01 detected as invalid: Invalid number: 1,01
1,0a detected as invalid: Invalid number: 1,0a
1..0 detected as invalid: Invalid number: 1..0
1.,0 detected as invalid: Invalid number: 1.,0
1..0..5ww detected as invalid: Invalid number: 1..0..5ww
1,,0..6zz detected as invalid: Invalid number: 1,,0..6zz

Using locale: Danish (Denmark)
0 parsed as: 0
0,1 parsed as: 0.1
0.1 detected as invalid: Invalid number: 0.1
1 parsed as: 1
1.0 detected as invalid: Invalid number: 1.0
1,0 parsed as: 1.0
1.00 detected as invalid: Invalid number: 1.00
1,00 parsed as: 1.00
1.01 detected as invalid: Invalid number: 1.01
1,01 parsed as: 1.01
1,0a detected as invalid: Invalid number: 1,0a
1..0 detected as invalid: Invalid number: 1..0
1.,0 detected as invalid: Invalid number: 1.,0
1..0..5ww detected as invalid: Invalid number: 1..0..5ww
1,,0..6zz detected as invalid: Invalid number: 1,,0..6zz

Using locale: Polish (Poland)
0 parsed as: 0
0,1 parsed as: 0.1
0.1 detected as invalid: Invalid number: 0.1
1 parsed as: 1
1.0 detected as invalid: Invalid number: 1.0
1,0 parsed as: 1.0
1.00 detected as invalid: Invalid number: 1.00
1,00 parsed as: 1.00
1.01 detected as invalid: Invalid number: 1.01
1,01 parsed as: 1.01
1,0a detected as invalid: Invalid number: 1,0a
1..0 detected as invalid: Invalid number: 1..0
1.,0 detected as invalid: Invalid number: 1.,0
1..0..5ww detected as invalid: Invalid number: 1..0..5ww
1,,0..6zz detected as invalid: Invalid number: 1,,0..6zz
Advertisements

About krzysztoftomaszewski

I've got M.Sc. in software engineering. I graduated in 2005 at Institute of Computer Science, Warsaw University of Technology, Faculty of Electronics and Information Technology. I'm working on computer software design and engineering continuously since 2004.
This entry was posted in Java, JSF. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s