- Why strings are immutable?
- String pool concept
- Keyword ‘intern’ usage
- Matching Regular expressions?
- Strings comparison?
- Memory leak issue
- String in java are like any other programming language, a sequence of characters.
/** The value is used for character storage. */ private final char value[]; |
To access this array in different scenarios, following variables are used:
/** The offset is the first index of the storage that is used. */ private final int offset; /** The count is the number of characters in the String. */ private final int count; |
Why strings are immutable?
- We all know that strings in java are immutable.
Here the question is WHY? Why immutable? Lets analyze.
- The very first reason i can think of is performance increase. Java language was developed to speed up the application development as it was not that much fast in previous languages.
- JVM designers must have been smart enough to identify that real world applications will consist of mostly Strings in form of labels, messages, configuration, output and such numerous ways.
Seeing such over use, they imagined how dangerous can be string’s improper use. So they came up with concept of String pool (next section).
- String pool is nothing but a collection of some strings mostly unique. The very basic idea behind String pool is to reuse string once created.
- This way if a particular string is created 20 times in code, application end up having only one instance.
- Second reason I see as security considerations. Strings are most used parameter type in each aspect of java programming. Be it loading a driver or open a URL connection, you need to pass the information as parameter in form of string.
- If strings have not been final then they have opened up a Pandora box of security issues.
String pool concept
- String pool is a special memory area separate from regular heap memory where these string constants are stored.
- These objects are referred string variables during the life cycle of application.
In java, String can be created by many ways. Lets understand them:
1) String assignment
String str = "abc" ; |
- Above code causes JVM to verify if there is already a string “abc” (same char sequence). If such string exist, JVM simply assign the reference of existing object to variable str, otherwise a new object “abc” will be created and its reference will be assigned to variable str.
2) Using new keyword
String str = new String( "abc" ); |
- This version end up creating two objects in memory. One object in string pool having char sequence “abc” and second in heap memory referred by variable str and having same char sequence as “abc”.
Keyword ‘intern’ usage
- When the
intern()
method is invoked, if the pool already contains a string equal to thisString
object as determined by theequals(Object)
method, then the string from the pool is returned. Otherwise, thisString
object is added to the pool and a reference to thisString
object is returned.
String str = new String( "abc" ); str.intern(); |
- It follows that for any two strings
s
andt
,s.intern() == t.intern()
istrue
if and only ifs.equals(t)
istrue
. Means if s and t both are different string objects and have same character sequence, then calling intern() on both will result in single string pool literal referred by both variables.
Matching Regular expressions
Not so secret but useful feature if you still have not explored it. You must have seen usage of Pattern and Matcher for regular expression matching. String class provides its own shortcut. Use it directly. This method also usesPattern.matches() inside function definition.
String str = new String( "abc" ); str.matches( "<regex>" ); |
String comparison
Another favorite area in interviews. There are generally two ways to compare objects
- Using == operator
- Using equals() method
== operator compare for object references i.e. memory address equality. So if two string objects are referring to same literal in string pool or same string object in heap then s ==t will return true, else false.
equals() method is overridden in String class and it verify the char sequences hold by string objects. If they store the same char sequence, the s.equals(t) will return true, else false.
Memory leak issue
Till now we gone through basic stuff. Now something serious. Have you tried creating substrings from a string object. I bet, Yes. Do you know the internals of substring in java. How they create memory leaks?
- Sub strings in java are created using method substring(int beginIndex) and some other overloaded forms of this method.
- All these methods create a new String object and update the offset and count variable which we saw in start of this article.
- The original value[] is unchaged. Thus if you create a string with 10000 chars and create 100 substrings with 5-10 chars in each, all 101 objects will have same char array of size 10000 chars. It is memory wastage without any doubt.
Let see this using a program:
import java.lang.reflect.Field; import java.util.Arrays; public class SubStringTest { public static void main(String[] args) throws Exception { //Our main String String mainString = "i_love_java" ; //Substring holds value 'java' String subString = mainString.substring( 7 ); System.out.println(mainString); System.out.println(subString); //Lets see what's inside mainString Field innerCharArray = String. class .getDeclaredField( "value" ); innerCharArray.setAccessible( true ); char [] chars = ( char []) innerCharArray.get(mainString); System.out.println(Arrays.toString(chars)); //Now peek inside subString chars = ( char []) innerCharArray.get(subString); System.out.println(Arrays.toString(chars)); } } Output: i_love_java java [i, _, l, o, v, e, _, j, a, v, a] [i, _, l, o, v, e, _, j, a, v, a] |
Clearly, both objects have same char array stored while subString need only four characters.
Lets solve this issue using our own code:
import java.lang.reflect.Field; import java.util.Arrays; public class SubStringTest { public static void main(String[] args) throws Exception { //Our main String String mainString = "i_love_java" ; //Substring holds value 'java' String subString = fancySubstring( 7 , mainString); System.out.println(mainString); System.out.println(subString); //Lets see what's inside mainString Field innerCharArray = String. class .getDeclaredField( "value" ); innerCharArray.setAccessible( true ); char [] chars = ( char []) innerCharArray.get(mainString); System.out.println(Arrays.toString(chars)); //Now peek inside subString chars = ( char []) innerCharArray.get(subString); System.out.println(Arrays.toString(chars)); } //Our new method prevents memory leakage public static String fancySubstring( int beginIndex, String original) { return new String(original.substring(beginIndex)); } } Output: i_love_java java [i, _, l, o, v, e, _, j, a, v, a] [j, a, v, a] |
Now substring has only characters which it need, and intermediate string used to create our correct substring can be garbage collected and thus leaving no memory footprint.