java-threads-tutorial.pdf | |
File Size: | 110 kb |
File Type: |
What is concurrency?
Concurrency is the ability to run several programs or several parts of a program in parallel. If a time consuming task can be performed asynchronously or in parallel, this improve the throughput and the interactivity of the program.
A modern computer has several CPU's or several cores within one CPU. The ability to leverage these multi-cores can be the key for a successful high-volume application.
Process vs. threads
A process runs independently and isolated of other processes. It cannot directly access shared data in other processes. The resources of the process, e.g. memory and CPU time, are allocated to it via the operating system.
A thread is a so called lightweight process. It has its own call stack, but can access shared data of other threads in the same process. Every thread has its own memory cache. If a thread reads shared data it stores this data in its own memory cache. A thread can re-read the shared data.
A Java application runs by default in one process. Within a Java application you work with several threads to achieve parallel processing or asynchronous behavior.
Improvements and issues with concurrency
Within a Java application you work with several threads to achieve parallel processing or asynchronous behavior.
Limits of concurrency gains
Concurrency promises to perform certain task faster as these tasks can be divided into subtasks and these subtasks can be executed in parallel. Of course the runtime is limited by parts of the task which can be performed in parallel.
The theoretical possible performance gain can be calculated by the following rule which is referred to as Amdahl's Law.
If F is the percentage of the program which can not run in parallel and N is the number of processes, then the maximum performance gain is 1/ (F+ ((1-F)/n)).
Concurrency issues
-Threads have their own call stack, but can also access shared data. Therefore you have two basic problems, visibility and access problems.
-A visibility problem occurs if thread A reads shared data which is later changed by thread B and thread A is unaware of this change.
-An access problem can occur if several thread access and change the same shared data at the same time.
-Visibility and access problem can lead to
Concurrency in Java Processes and Threads
A Java program runs in its own process and by default in one thread. Java supports threads as part of the Java language via the Thread code. The Java application can create new threads via this class.
Java 1.5 also provides improved support for concurrency with the in the java.util.concurrent package.
Locks and thread synchronization
Java provides locks to protect certain parts of the code to be executed by several threads at the same time. The simplest way of locking a certain method or Java class is to define the method or class with the synchronized keyword.
The synchronized keyword in Java ensures:
public synchronized void critial() { // some thread critical stuff // here }
You can also use the synchronized keyword to protect blocks of code within a method. This block is guarded by a key, which can be either a string or an object. This key is called the lock. All code which is protected by the same lock can only be executed by one thread at the same time
For example the following data structure will ensure that only one thread can access the inner block of the add() and next()methods.
package java.threads.BenAtwa;
import java.util.ArrayList;
import java.util.List;
/*
* Data structure for a web crawler. Keeps track of the visited sites and keeps
* a list of sites which needs still to be crawled.
* @author salah atwa
*/
public class CrawledSites
{
private List<String> crawledSites = new ArrayList<String>();
private List<String> linkedSites = new ArrayList<String>();
public void add(String site)
{
synchronized (this) {
if (!crawledSites.contains(site))
{
linkedSites.add(site);
}
}
}
/** * Get next site to crawl. Can return null (if nothing to crawl) */
public String next()
{if (linkedSites.size() == 0) { return null; } synchronized (this) {
// Need to check again if size has changed
if (linkedSites.size() > 0)
{
String s = linkedSites.get(0);
linkedSites.remove(0);
crawledSites.add(s);
return s;
}
return null;
}
}
}
Volatile
If a variable is declared with the volatile keyword then it is guaranteed that any thread that reads the field will see the most recently written value. The volatile keyword will not perform any mutual exclusive lock on the variable.
As of Java 5 write access to a volatile variable will also update non-volatile variables which were modified by the same thread. This can also be used to update values within a reference variable, e.g. for a volatile variable person. In this case you must use a temporary variable person and use the setter to initialize the variable and then assign the temporary variable to the final variable. This will then make the address changes of this variable and the values visible to other threads.
The Java memory model
Overview
The Java memory model describes the communication between the memory of the threads and the main memory of the application.
It defines the rules how changes in the memory done by threads are propagated to other threads. The Java memory model also defines the situations in which a thread re-fresh its own memory from the main memory.
It also describes which operations are atomic and the ordering of the operations.
Atomic operation
An atomic operation is an operation which is performed as a single unit of work without the possibility of interference from other operations.
The Java language specification guarantees that reading or writing a variable is an atomic operation(unless the variable is of typelong or double). Operations variables of type long or double are only atomic if they declared with the volatile keyword. .
Assume i is defined as int. The i++ (increment) operation it not an atomic operation in Java. This also applies for the other numeric types, e.g. long. etc).
The i++ operation first reads the value which is currently stored in i (atomic operations) and then it adds one to it (atomic operation). But between the read and the write the value of i might have changed.
Since Java 1.5 the java language provides atomic variables, e.g. AtomicInteger or AtomicLong which provide methods likegetAndDecrement(), getAndIncrement() and getAndSet() which are atomic.
Memory updates in synchronized code
The Java memory model guarantees that each thread entering a synchronized block of code sees the effects of all previous modifications that were guarded by the same lock.
To make a class immutable make
An immutable class may have some mutable data which is uses to manages its state but from the outside this class nor any attribute of this class can get changed.
For all mutable fields, e.g. Arrays, that are passed from the outside to the class during the construction phase, the class needs to make a defensive-copy of the elements to make sure that no other object from the outside still can change the data
Defensive Copies
You must protect your classes from calling code. Assume that calling code will do its best to change your data in a way you didn't expect it. While this is especially true in case of immutable data it is also true for non-immutable data which you still not expect that this data is changed outside your class.
To protect your class against that you should copy data you receive and only return copies of data to calling code.
The following example creates a copy of a list (ArrayList) and returns only the copy of the list. This way the client of this class cannot remove elements from the list.
package threads.benatwa;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class MyDataStructure
{
List<String> list = new ArrayList<String>();
public void add(String s) { list.add(s); }
/** * Makes a defensive copy of the List and return it * This way cannot modify the list itself * * @return List<String> */
public List<String> getList() { return Collections.unmodifiableList(list); }
}
Threads in Java
The base means for concurrency are is the java.lang.Threads class. A Thread executes an object of typejava.lang.Runnable.
Runnable is an interface with defines the run() method. This method is called by the Thread object and contains the work which should be done. Therefore the "Runnable" is the task to perform. The Thread is the worker who is doing this task.
The following demonstrates a task (Runnable) which counts the sum of a given range of numbers. Create a new Java project called de.vogella.concurrency.threads for the example code of this section.
package threads.benatwa;
/**
* MyRunnable will count the sum of the number from 1 to the parameter
* countUntil and then write the result to the console.
* <p>
* MyRunnable is the task which will be performed *
* @author salah atwa
* */
public class MyRunnable implements Runnable
{
private final long countUntil;
MyRunnable(long countUntil) { this.countUntil = countUntil; }
@Override public void run()
{
long sum = 0;
for (long i = 1; i < countUntil; i++)
{
sum += i;
}
System.out.println(sum);
}
}
Using the Thread class directly has the following disadvantages.
Threads pools with the Executor Framework
Tip You find this examples in the source section in Java project called benatwa.concurrency.threadpools.
Thread pools manage a pool of worker threads. The thread pools contains a work queue which holds tasks waiting to get executed.
A thread pool can be described as a collection of Runnable objects (work queue) and a connections of running threads. These threads are constantly running and are checking the work query for new work. If there is new work to be done they execute this Runnable. The Thread class itself provides a method, e.g. execute(Runnable r) to add a new Runnable object to the work queue.
The Executor framework provides example implementation of the java.util.concurrent.Executor interface, e.g. Executors.newFixedThreadPool(int n) which will create n worker threads. The ExecutorService adds life cycle methods to the Executor, which allows to shutdown the Executor and to wait for termination.
Tip If you want to use one thread pool with one thread which executes several runnables you can use the Executors.newSingleThreadExecutor() method.
Create again the Runnable.
package benatwa.concurrency.threadpools;
/**
* MyRunnable will count the sum of the number from 1 to the parameter
* countUntil and then write the result to the console.
* <p> * MyRunnable is the task which will be performed
* salah atwa
* */
public class MyRunnable implements Runnable
{
private final long countUntil;
MyRunnable(long countUntil) { this.countUntil = countUntil; }
@Override public void run() {
long sum = 0;
for (long i = 1; i < countUntil; i++)
{
sum += i;
}
System.out.println(sum);
}
}
Now you run your runnables with the executor framework.
package ben atwa.concurrency.threadpools;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Main
{
private static final int NTHREDS = 10;
public static void main(String[] args)
{
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
for (int i = 0; i < 500; i++)
{
Runnable worker = new MyRunnable(10000000L + i);
executor.execute(worker);
}
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
executor.awaitTermination();
System.out.println("Finished all threads");
}
}
In case the threads should return some value (result-bearing threads) then you can use the java.util.concurrent.Callableclass.
Futures and Callables
The code examples for this section are created in a Java project called benatwa.concurrency.callables.
The executor framework presented in the last chapter works with Runnables. Runnable do not return result.
In case you expect your threads to return a computed result you can use java.util.concurrent.Callable. The Callable object allows to return values after completion.
The Callable object uses generics to define the type of object which is returned.
If you submit a Callable object to an Executor the framework returns an object of type java.util.concurrent.Future. This Future object can be used to check the status of a Callable and to retrieve the result from the Callable.
On the Executor you can use the method submit to submit a Callable and to get a future. To retrieve the result of the future use the get() method.
package benatwa.concurrency.callables;
import java.util.concurrent.Callable;
public class MyCallable implements Callable<Long>
{
@Override public Long call() throws Exception
{
long sum = 0; for (long i = 0; i <= 100; i++)
{ sum += i; }
return sum;
}
}
package benatwa.concurrency.callables;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class CallableFutures
{
private static final int NTHREDS = 10;
public static void main(String[] args)
{
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
List<Future<Long>> list = new ArrayList<Future<Long>>();
for (int i = 0; i < 20000; i++)
{
Callable<Long> worker = new MyCallable();
Future<Long> submit = executor.submit(worker);
list.add(submit);
}
long sum = 0;
System.out.println(list.size());
// now retrieve the result
for (Future<Long> future : list) { try { sum += future.get(); }
catch (InterruptedException e) { e.printStackTrace(); }
catch (ExecutionException e) { e.printStackTrace(); } }
System.out.println(sum);
executor.shutdown();
}
}
Deadlock
A concurrent application has the risk of a deadlock. A set of processes are deadlocked if all processes are waiting for an event which another process in the same set has to cause.
For example if thread A waits for a lock on object Z which thread B holds and thread B wait for a look on object Y which is hold be process A then these two processes are looked and cannot continue in their processing.
Concurrency is the ability to run several programs or several parts of a program in parallel. If a time consuming task can be performed asynchronously or in parallel, this improve the throughput and the interactivity of the program.
A modern computer has several CPU's or several cores within one CPU. The ability to leverage these multi-cores can be the key for a successful high-volume application.
Process vs. threads
A process runs independently and isolated of other processes. It cannot directly access shared data in other processes. The resources of the process, e.g. memory and CPU time, are allocated to it via the operating system.
A thread is a so called lightweight process. It has its own call stack, but can access shared data of other threads in the same process. Every thread has its own memory cache. If a thread reads shared data it stores this data in its own memory cache. A thread can re-read the shared data.
A Java application runs by default in one process. Within a Java application you work with several threads to achieve parallel processing or asynchronous behavior.
Improvements and issues with concurrency
Within a Java application you work with several threads to achieve parallel processing or asynchronous behavior.
Limits of concurrency gains
Concurrency promises to perform certain task faster as these tasks can be divided into subtasks and these subtasks can be executed in parallel. Of course the runtime is limited by parts of the task which can be performed in parallel.
The theoretical possible performance gain can be calculated by the following rule which is referred to as Amdahl's Law.
If F is the percentage of the program which can not run in parallel and N is the number of processes, then the maximum performance gain is 1/ (F+ ((1-F)/n)).
Concurrency issues
-Threads have their own call stack, but can also access shared data. Therefore you have two basic problems, visibility and access problems.
-A visibility problem occurs if thread A reads shared data which is later changed by thread B and thread A is unaware of this change.
-An access problem can occur if several thread access and change the same shared data at the same time.
-Visibility and access problem can lead to
- Liveness failure: The program does not react anymore due to problems in the concurrent access of data, e.g. deadlocks.
- Safety failure: The program creates incorrect data.
Concurrency in Java Processes and Threads
A Java program runs in its own process and by default in one thread. Java supports threads as part of the Java language via the Thread code. The Java application can create new threads via this class.
Java 1.5 also provides improved support for concurrency with the in the java.util.concurrent package.
Locks and thread synchronization
Java provides locks to protect certain parts of the code to be executed by several threads at the same time. The simplest way of locking a certain method or Java class is to define the method or class with the synchronized keyword.
The synchronized keyword in Java ensures:
- that only a single thread can execute a block of code at the same time
- that each thread entering a synchronized block of code sees the effects of all previous modifications that were guarded by the same lock
- Synchronization is necessary for mutually exclusive access to blocks of and for reliable communication between threads.
public synchronized void critial() { // some thread critical stuff // here }
You can also use the synchronized keyword to protect blocks of code within a method. This block is guarded by a key, which can be either a string or an object. This key is called the lock. All code which is protected by the same lock can only be executed by one thread at the same time
For example the following data structure will ensure that only one thread can access the inner block of the add() and next()methods.
package java.threads.BenAtwa;
import java.util.ArrayList;
import java.util.List;
/*
* Data structure for a web crawler. Keeps track of the visited sites and keeps
* a list of sites which needs still to be crawled.
* @author salah atwa
*/
public class CrawledSites
{
private List<String> crawledSites = new ArrayList<String>();
private List<String> linkedSites = new ArrayList<String>();
public void add(String site)
{
synchronized (this) {
if (!crawledSites.contains(site))
{
linkedSites.add(site);
}
}
}
/** * Get next site to crawl. Can return null (if nothing to crawl) */
public String next()
{if (linkedSites.size() == 0) { return null; } synchronized (this) {
// Need to check again if size has changed
if (linkedSites.size() > 0)
{
String s = linkedSites.get(0);
linkedSites.remove(0);
crawledSites.add(s);
return s;
}
return null;
}
}
}
Volatile
If a variable is declared with the volatile keyword then it is guaranteed that any thread that reads the field will see the most recently written value. The volatile keyword will not perform any mutual exclusive lock on the variable.
As of Java 5 write access to a volatile variable will also update non-volatile variables which were modified by the same thread. This can also be used to update values within a reference variable, e.g. for a volatile variable person. In this case you must use a temporary variable person and use the setter to initialize the variable and then assign the temporary variable to the final variable. This will then make the address changes of this variable and the values visible to other threads.
The Java memory model
Overview
The Java memory model describes the communication between the memory of the threads and the main memory of the application.
It defines the rules how changes in the memory done by threads are propagated to other threads. The Java memory model also defines the situations in which a thread re-fresh its own memory from the main memory.
It also describes which operations are atomic and the ordering of the operations.
Atomic operation
An atomic operation is an operation which is performed as a single unit of work without the possibility of interference from other operations.
The Java language specification guarantees that reading or writing a variable is an atomic operation(unless the variable is of typelong or double). Operations variables of type long or double are only atomic if they declared with the volatile keyword. .
Assume i is defined as int. The i++ (increment) operation it not an atomic operation in Java. This also applies for the other numeric types, e.g. long. etc).
The i++ operation first reads the value which is currently stored in i (atomic operations) and then it adds one to it (atomic operation). But between the read and the write the value of i might have changed.
Since Java 1.5 the java language provides atomic variables, e.g. AtomicInteger or AtomicLong which provide methods likegetAndDecrement(), getAndIncrement() and getAndSet() which are atomic.
Memory updates in synchronized code
The Java memory model guarantees that each thread entering a synchronized block of code sees the effects of all previous modifications that were guarded by the same lock.
- Immutability and Defensive Copies
- Immutability
To make a class immutable make
- all its fields final
- the class declared as final
- the this reference is not allowed to escape during construction
- Any fields which refer to mutable data objects are
- private
- have no setter method
- they are never directly returned of otherwise exposed to a caller
- if they are changed internally in the class this change is not visible and has no effect outside of the class
An immutable class may have some mutable data which is uses to manages its state but from the outside this class nor any attribute of this class can get changed.
For all mutable fields, e.g. Arrays, that are passed from the outside to the class during the construction phase, the class needs to make a defensive-copy of the elements to make sure that no other object from the outside still can change the data
Defensive Copies
You must protect your classes from calling code. Assume that calling code will do its best to change your data in a way you didn't expect it. While this is especially true in case of immutable data it is also true for non-immutable data which you still not expect that this data is changed outside your class.
To protect your class against that you should copy data you receive and only return copies of data to calling code.
The following example creates a copy of a list (ArrayList) and returns only the copy of the list. This way the client of this class cannot remove elements from the list.
package threads.benatwa;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class MyDataStructure
{
List<String> list = new ArrayList<String>();
public void add(String s) { list.add(s); }
/** * Makes a defensive copy of the List and return it * This way cannot modify the list itself * * @return List<String> */
public List<String> getList() { return Collections.unmodifiableList(list); }
}
Threads in Java
The base means for concurrency are is the java.lang.Threads class. A Thread executes an object of typejava.lang.Runnable.
Runnable is an interface with defines the run() method. This method is called by the Thread object and contains the work which should be done. Therefore the "Runnable" is the task to perform. The Thread is the worker who is doing this task.
The following demonstrates a task (Runnable) which counts the sum of a given range of numbers. Create a new Java project called de.vogella.concurrency.threads for the example code of this section.
package threads.benatwa;
/**
* MyRunnable will count the sum of the number from 1 to the parameter
* countUntil and then write the result to the console.
* <p>
* MyRunnable is the task which will be performed *
* @author salah atwa
* */
public class MyRunnable implements Runnable
{
private final long countUntil;
MyRunnable(long countUntil) { this.countUntil = countUntil; }
@Override public void run()
{
long sum = 0;
for (long i = 1; i < countUntil; i++)
{
sum += i;
}
System.out.println(sum);
}
}
Using the Thread class directly has the following disadvantages.
- Creating a new thread causes some performance overhead
- Too many threads can lead to reduced performance, as the CPU needs to switch between these threads.
- You cannot easily control the number of threads, therefore you may run into out of memory errors due to too many threads.
Threads pools with the Executor Framework
Tip You find this examples in the source section in Java project called benatwa.concurrency.threadpools.
Thread pools manage a pool of worker threads. The thread pools contains a work queue which holds tasks waiting to get executed.
A thread pool can be described as a collection of Runnable objects (work queue) and a connections of running threads. These threads are constantly running and are checking the work query for new work. If there is new work to be done they execute this Runnable. The Thread class itself provides a method, e.g. execute(Runnable r) to add a new Runnable object to the work queue.
The Executor framework provides example implementation of the java.util.concurrent.Executor interface, e.g. Executors.newFixedThreadPool(int n) which will create n worker threads. The ExecutorService adds life cycle methods to the Executor, which allows to shutdown the Executor and to wait for termination.
Tip If you want to use one thread pool with one thread which executes several runnables you can use the Executors.newSingleThreadExecutor() method.
Create again the Runnable.
package benatwa.concurrency.threadpools;
/**
* MyRunnable will count the sum of the number from 1 to the parameter
* countUntil and then write the result to the console.
* <p> * MyRunnable is the task which will be performed
* salah atwa
* */
public class MyRunnable implements Runnable
{
private final long countUntil;
MyRunnable(long countUntil) { this.countUntil = countUntil; }
@Override public void run() {
long sum = 0;
for (long i = 1; i < countUntil; i++)
{
sum += i;
}
System.out.println(sum);
}
}
Now you run your runnables with the executor framework.
package ben atwa.concurrency.threadpools;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Main
{
private static final int NTHREDS = 10;
public static void main(String[] args)
{
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
for (int i = 0; i < 500; i++)
{
Runnable worker = new MyRunnable(10000000L + i);
executor.execute(worker);
}
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
executor.awaitTermination();
System.out.println("Finished all threads");
}
}
In case the threads should return some value (result-bearing threads) then you can use the java.util.concurrent.Callableclass.
Futures and Callables
The code examples for this section are created in a Java project called benatwa.concurrency.callables.
The executor framework presented in the last chapter works with Runnables. Runnable do not return result.
In case you expect your threads to return a computed result you can use java.util.concurrent.Callable. The Callable object allows to return values after completion.
The Callable object uses generics to define the type of object which is returned.
If you submit a Callable object to an Executor the framework returns an object of type java.util.concurrent.Future. This Future object can be used to check the status of a Callable and to retrieve the result from the Callable.
On the Executor you can use the method submit to submit a Callable and to get a future. To retrieve the result of the future use the get() method.
package benatwa.concurrency.callables;
import java.util.concurrent.Callable;
public class MyCallable implements Callable<Long>
{
@Override public Long call() throws Exception
{
long sum = 0; for (long i = 0; i <= 100; i++)
{ sum += i; }
return sum;
}
}
package benatwa.concurrency.callables;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class CallableFutures
{
private static final int NTHREDS = 10;
public static void main(String[] args)
{
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
List<Future<Long>> list = new ArrayList<Future<Long>>();
for (int i = 0; i < 20000; i++)
{
Callable<Long> worker = new MyCallable();
Future<Long> submit = executor.submit(worker);
list.add(submit);
}
long sum = 0;
System.out.println(list.size());
// now retrieve the result
for (Future<Long> future : list) { try { sum += future.get(); }
catch (InterruptedException e) { e.printStackTrace(); }
catch (ExecutionException e) { e.printStackTrace(); } }
System.out.println(sum);
executor.shutdown();
}
}
Deadlock
A concurrent application has the risk of a deadlock. A set of processes are deadlocked if all processes are waiting for an event which another process in the same set has to cause.
For example if thread A waits for a lock on object Z which thread B holds and thread B wait for a look on object Y which is hold be process A then these two processes are looked and cannot continue in their processing.