AVRO-4069: Remove Reader String Cache from Generic Datum Reader#3194
AVRO-4069: Remove Reader String Cache from Generic Datum Reader#3194belugabehr wants to merge 2 commits intoapache:mainfrom
Conversation
|
@martin-g Thanks so much for your support on these PRs. Here's another one that needs attention that has a positive performance impact. |
martin-g
left a comment
There was a problem hiding this comment.
Have you looked at the Git commit / JIRA that introduced this cache ?
I guess it has been added for a reason.
Now you remove one of the tests without good reason (at least I don't see it) and without adding better tests.
https://issues.apache.org/jira/browse/AVRO-3531 talks about a real world scenario where the code without the cache caused issues.
| private static final Random r = new Random(System.currentTimeMillis()); | ||
|
|
||
| @Test | ||
| void readerCache() { |
There was a problem hiding this comment.
Why do you delete the test ?
Looking at it I think it should still work. You just need to remove the ctor parameter.
There was a problem hiding this comment.
I believe this is testing the thread-safe nature of the code. This code should not be considered thread-safe and therefore should be removed.
|
AVRO-3531 was a thread safety bug. Its fix #1719 changed the types of the caches and moved them into I don't know whether |
|
@KalleOlaviNiemitalo @martin-g Thank you for the correspondence. I'm taking another look at this. It seems less than ideal that there is synchronized code paths for fast-reading Avro data. I am not sure why this Collection was ever updated to be synchronized as this code is inherently not thread-safe. The GenericDatumReader If we can relax this requirement then performance is measurably better as a read path will have no synchronization overhead. |
3f2cc32 to
4ff72b2
Compare
| // For some of the more common classes, implement specific routines. | ||
| // For more complex classes, use reflection. | ||
| if (c == Integer.class) { | ||
| return Integer.parseInt(s, 10); |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
| IdentitySchemaKey key = (IdentitySchemaKey) obj; | ||
| return this == key || this.schema == key.schema; | ||
| if (c == Long.class) { | ||
| return Long.parseLong(s, 10); |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
| public ReaderCache(Function<Schema, Class> findStringClass) { | ||
| this.findStringClass = findStringClass; | ||
| if (c == Float.class) { | ||
| return Float.parseFloat(s); |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
| final Function<String, Object> ctor = stringCtorCache.computeIfAbsent(c, this::buildFunction); | ||
| return ctor.apply(s); | ||
| if (c == Double.class) { | ||
| return Double.parseDouble(s); |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
4ff72b2 to
5b2fdf6
Compare
|
I am taking another pass at this one. The performance improvements are just too significant to pass up (11%+ in my local benchmarks). Here are my benchmarks: The reason I even investigated this is because Visual VM profiling was lighting up on There is one caveat in the JavaDocs:
I get around this by exploiting the fact that the number of items in the map is fixed as instantiation. Therefore, I use a Map size sufficiently large such that there are no key collisions and therefore no chaining involved. |
|
One thing to note is that I had to change the |
|
Sorry, I am not familiar with this code to be able to tell whether the changes are good or not. |
|
@martin-g Thanks for the pass. Any ideas who else might be available to review? |
What is the purpose of the change
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Documentation