CS 475, Spring 2019, Homework 1, Due Feb 11, 11:00 am
Start by downloading the handout:
Please post questions on Piazza
Revision History:
- 1/30 Add 2 hints/notes to bottom of part 1
- 2/4: Draw attention to the fact that you can use, 1: synchronized(theList) for part 2, 2: you can add new fields to your KeyValueLoader, and 3: the semantics of addToList and deleteFromList
- 2/5: Add note about using the “kvStore” instance in the KeyValueLoader
- 2/6: If you use an executor, make 100% certain that you do not return from loadFile before the executor is done — remember from the code example in class, we do this by calling executor.shutdown() and then executor.awaitTermination(…), using a timeout that is far greater than might ever be used to actually do the work (say, several minutes)
- 2/10: Add hint for part 4
The purpose of this assignment is to get you familiar with basic concurrency control topics in Java. For this assignment, you will build a simple key-value store (KV Store) and client in Java. KV stores are widely used as layers underneath complex distributed systems, and examples include Redis, Voldemort, Dynamo. KV stores are very interesting to study from the perspective of concurrent and distributed software because the interface that they expose to client applications is incredibly simple (appearing effectively to be just a HashMap), but their implementation can be incredibly complex. In particular, once a KV store needs to be able to store more data than can fit in main memory on a single machine, deciding how to swap data to disk can be an interesting problem. If the KV store is then distributed amongst multiple machines, then the specific mechanisms used to keep data in sync can become quite complicated as well.
For this assignment, you will implement a simple KV store that operates entirely within memory, and on a single machine. However, the KV store will be accessed by multiple client threads, and hence, must maintain thread safety. You will also implement a data loader tool, which will read a file in line-by-line, parse each line, and then load that data into the KV store.
General requirements:
We have provided you a baseline implementation of the KV store that has various methods stubbed out, which you will implement. You may not include any additional libraries (e.g. download and and require additional JARs), although feel free to use any of the various libraries included with Java or already included by the starter project. Your project will be compiled and tested using Java 8, so please do not use any Java 9+ exclusive features (the first long-term support version of Java > 8 just came out in September; we’ll switch to Java 11 next semester, sorry).
Your KV Store will be compiled and tested using apache maven, which automatically executes the provided JUnit tests. Please install Maven on your computer. Unfortunately, Maven is not installed on Zeus, however you can download the most recent version (e.g. apache-maven-3.5.2-bin.zip) and unzip it in your home directory. Then, to run maven, type ~/apache-maven-3.5.2/bin/mvn in the snippets below (instead of just mvn). Note that you can easily import maven projects into eclipse and IntelliJ (we suggest IntelliJ over eclipse).
To compile and run your KV store, run mvn package and then run java -jar target/kvstore-2019.3.1-SNAPSHOT.jar input.dat.
Your KV store will be automatically graded for correctness (note that there will be a manual grading phase to check hard-to-automatically-catch concurrency issues). Included with your handout is a simple version of the (large) test script that we will use to grade your assignment. Upon submitting your assignment, our server will automatically compile and test your assignment and provide you with test results. You can resubmit 50 times before the deadline without penalty. To run these tests, simply execute mvn -Ptest test (of course, if you do this first off, you’ll see that they all fail!)
Note: Your code must compile and run on the autograder, under Java 8. It is unlikely that you will have any difficulties developing on a Mac, Linux or Windows. When you feel satisfied with implementing one phase of the assignment, submit to AutoLab and verify that AutoLab agrees.
Tips and References
You might find it useful to reference the official Java tutorial on concurrency, which does a nice job outlining at a high level how you can create threads and locks. If you find any useful references, please feel free to share them on Piazza and we will update this page as well.
Part 1: The Simplest Key Value Store (20%)
To get started, you’ll implement the basic functionality of the key value store, and ensure that it works in a normal (single-threaded) environment. This will act as something of a baseline for Java programming without concurrency. The tricky parts will come when we make this functionality thread-safe (in Part 2).
Your KV store will support two different kinds of values: strings and lists. Hence, a key might be mapped to: (1) nothing, (2) a single string, and (3) a list (containing any number of strings).
You will implement this behavior in edu.gmu.cs475.KeyValueStore, specifically in the following four methods:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
|
/**
* Retrieve an element from this key value store
*
* @param key the key to retrieve
* @return The value mapped to this key, if one exists, otherwise null
* @throws NullPointerException if key is null
*/
public abstract Object get(String key);
/**
* Returns the type of element that is stored at this key
*
* @param key the key to inquire about
* @return A ValueType indicating the kind of value mapped to this key, if one exists, otherwise null
* @throws NullPointerException if key is null
*/
public abstract ValueType getType(String key);
/**
* Lists all of the keys that are currently known to this key-value store
*
* @return A set containing all currently valid keys
*/
public abstract Set listKeys();
/**
* Sets a key to be the given value
*
* @param key key to set
* @param value value to store
* @throws NullPointerException if key or value is null
* @throws IllegalArgumentException if value is not a String or List.
*/
public abstract void set(String key, Object value);
/**
* Removes this key and any values stored at it from the map
*
* @param key key to remove
* @return true if the key was successfully deleted, or false if it did not exist
* @throws NullPointerException if key is null
*/
public abstract boolean remove(String key);
/**
* Appends the given item to the list stored at the given key
*
* @param key key that contains the list
* @param newItem item to append to the list at the given key
* @throws NullPointerException if key is null
* @throws IllegalArgumentException if key doesn't exist, or key does not represent a list
*/
public abstract void addToList(String key, String newItem) throws IllegalArgumentException;
/**
* Deletes at most one occurrence of itemToDelete from the list stored at the given key
*
* @param key key containing the list
* @param itemToDelete item to delete from the list
* @throws IllegalArgumentException if key doesn't exist, or key does not represent a list
* @throws NullPointerException if key is null
* @returns true if the item was deleted from the list (false if itemToDelete didn't exist in that list)
*/
public abstract boolean deleteFromList(String key, String itemToDelete) throws IllegalArgumentException;
|
Note the following requirements:
- You should feel free to add whatever methods you would like to the KeyValueStore class, but you must not change the names or parameters of the methods listed above.
- You must not add any fields to any classes.
- You can not use your own HashMap, but instead it is almost as easy: You must call _get, _set, _remove, and _listKeys in your KeyValueStore to access the underlying data store. You must not maintain any mapping yourself between keys and values.
Additional comments/tips:
- 1/30: set should throw an IllegalArgumentException if the value is not a String or List — this means that the value can be either any instance of the class java.lang.String or any instance of java.util.List (and its subtypes) instanceOf is what you should be using here.
- 1/30: getType will likely also need to call get and use instanceof
Precise grading breakdown:
- Automated tests: 15 points
- 9 JUnit tests, 1 2/3 points each
- 5 points from manual grading
Part 2: Managing Thread Safety (35%)
Once you are satisfied that your key-value store works correctly in a single-threaded environment, your next task will be to add locking to it. You should implement all of your concurrency control through (1) the this.bigLock object (a java.util.concurent.locks.Lock object), and (2) by synchronizing on list objects. (2/4: Note this is a hint – you should be making calls to the lock, and also to synchronized(theListObject)).
Note the following concurrency requirements:
- You must not allow concurrent calls to any of the following methods: _get, _set, _remove, or _listKeys
- For the operations addToList and deleteFromList: if addToList (or deleteFromList) are concurrently called for two different keys, you must allow them to execute concurrently (subject to requirement 1). (2/4: Note that a side effect of this requirement is that a key might be deleted from the KV store while addToList is running [after it calls _get and before it calls add] – this is OK).
Precise grading breakdown:
- 5 JUnit tests, 6 points each
- 5 points from manual grading
Part 3: Bulk Data Loader (35%)
Once you have gotten the KV store running, the last component to develop will be the data loader (in the KeyValueLoader class). The idea here is to implement a simple component that takes in a file of key=value pairs, then inserts them into the KV store. The semantics of the methods that you need to implement are detailed below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
/**
* Given a single line, load it into the kvStore.
*
* The line is guaranteed to be of the format key=newValue, where "key" may not
* contain an equals character ('='), and neither key nor value are null.
*
* The process for storing a line key=newValue is:
* 1. Check to see if the key exists
* 2. If the key does not exist, then add this key/value pair (as a string)
* 3. If the key does exist, and it is a list, then add this value to that key's list
* 4. If the key does exist, and its value is a string [oldValue], then replace the string with a list,
* containing the initial elements [oldValue, newValue]. The list must be created by calling newList.
*
* @param line
*/
protected abstract void loadLine(String line);
/**
* Loads all of the lines in the given file into the kvStore
*
* Must load lines in parallel. You must perform this parallelization by
* submitting tasks to the executor object -- one per line in the inputFile
*
* @param inputFile file to load in
* @throws IOException if any underlying error occurs when reading the file
*/
protected abstract void loadFile(Path inputFile) throws IOException;
|
For this component, you should feel free to add any (non-static) fields or methods to the KeyValueLoader. (2/4: Hint: you will need to have your own map of locks for each key, and ensure that you acquire and release this lock in loadLine. As per the prior sentence, you should feel free to add new fields or methods to the KeyValueLoader. Start by adding a method getLockForKey(String key), and reason through its correctness first.)
2/5: To pass the tests, you must use the “this.kvStore” as your instance of the KeyValueStore for storing the processed lines.
2/6: If you use an executor, make 100% certain that you do not return from loadFile before the executor is done — remember from the code example in class, we do this by calling executor.shutdown() and then executor.awaitTermination(…), using a timeout that is far greater than might ever be used to actually do the work (say, several minutes)
Note: you must not use Files.lines(..).parallel().forEach(...) to read the line in and process it in parallel: in Java 8, this stream does not support parallel processing (and it happens sequentially).
Take note of the concurrency issues that you may run into, given that loadLine will likely be called from multiple threads simultaneously. In particular, take note of your handling of step (4) in the process described for the method (converting an existing string into a list).
Precise grading breakdown:
- 3 JUnit tests, 10 points each
- 5 points from manual grading
Part 4: Testing the Tests (10%)
We have included several functional tests that simply test the behavior of each of these methods. We have not provided any tests (for you to see) that test the behavior of your software under concurrency (the tests for parts 2 and 3). One of your tasks is to write such tests.
Note that it is generally impossible to automatically prove that concurrent programs have no races, it is common instead to test for low-hanging fruit. For instance, it is possible to write tests that demonstrate that two events can happen concurrently, and tests that show that it’s unlikely (but not impossible) that two other events do not happen concurrently.
We are asking you to implement one of the tests that we are using to test part 2: testAddOrDeleteFromDifferentListsShouldBeConcurrent. The goal of this test is to demonstrate that it is possible to call addToList(key1,value1) and addToList(key2,value2) concurrently, and for both of the underlying add operations to actually occur concurrently (and, similarly for deleteFromList). To do so, you can use any concurrency tools that you would like: make as many threads as you need (keep it under 10 though, please), use monitors ( synchronized, with wait and notify), locks, semaphores, latches, etc. There are many ways to do this correctly – it is up to you to decide how to proceed.
Hint (2/5): If you’re unsure where to start, we suggest that you start by using just threads, using Thread.join(timeout), with no extra concurrency control.
Hint (2/10): The illustration below shows the interleaving that we are trying to witness: that while Thread 1’s addToList is still running (that is to say, before addToList returns in Thread 1), Thread 2 successfully starts and adds an item to the same list at the same key as Thread 1. Note that in the graphic we show the two linearization points (from your call to kvStore.get and to list.add, which should both be protected with different locks). Because list.add (or remove) should be protected with a lock (or synchronized block), it should not be possible to see the two list.add’s overlap.
Implement your test in Part4TestingTheTestTests. Feel free to add any non-static fields to this class.
Your test will be tested on Autolab on three different KeyValueStores: a correct KVStore (your test should pass on this implementation) and two incorrect KVStores (your test should fail on both of these implementations).
No marks will be awarded for empty tests, or tests that have no assertions (note that otherwise, these tests would all pass on our correct implementation). You will receive full marks only if your test successfully passes on the correct implementation and fails on the buggy versions.
Grading
Your assignment will be graded on a series of functional tests, making sure that it implements the specification above. Your test will be graded by running it on purposely buggy code and verifying that it throws some errors, and then manually inspecting them for completeness. We have provided a (not exhaustive) test suite which should help you judge (on your own) how well you have done with the functional requirements. We may add additional tests beyond what we are providing you with now, so please do not rely entirely on them.
Hand In Instructions
You must turn in your assignment using Autolab (You MUST be on the campus network, or connected to the GMU VPN to connect to Autolab). If you did not receive a confirmation email from Autolab to set a password, enter your @gmu.edu (NOT @masonlive) email, and click “forgot password” to get a new password.
Create a zip file of only the src directory in your assignment (please: .zip, not .tgz or .7z etc). When you upload your assignment, Autolab will automatically compile and test it. You should verify that the result that Autolab generates is what you expect. Your code is built and tested in a Linux VM. Assignments that do not compile using our build script will receive a maximum of 50%. Note that we have provided ample resources for you to verify that our view of your assignment is the same as your own: you will see the result of the compilation and test execution for your assignment when you submit it.
You can resubmit your assignment an unlimited number of times before the deadline. Note the course late-submission policy: assignments will be accepted up until 24 hours past the deadline at a penalty of 10%; after 24 hours, no late assignments will be accepted, no exceptions.
Questions
Please post questions on Piazza