Java is the new C: Experiment: Cache effects when scheduling Actors with F/J, Threadpool, Dedicated Threads

Tuesday, October 14, 2014

Experiment: Cache effects when scheduling Actors with F/J, Threadpool, Dedicated Threads

Update: I accidentally used newSingleThreadScheduledExecutor instead of newFixedThreadPool(1) for the "Dedicated" test case [ide code completion ..]. With this corrected, "Dedicated" outperforms even more. See follow up post for updated results + "perf" tool cache miss measurement results (do not really change the big picture).

The experiment in my last post had a serious flaw: In an actor system, operations on a single actor are executed one after the other. However by naively adding message-processing jobs to executors, private actor state was accessed concurrently, leading to "false-sharing" and cache coherency related costs especially for small local state sizes.

Therefore I modified the test. For each Actor scheduled, the next message-processing is scheduled once the previous one finished, so the experiment resembles the behaviour of typical actors (or lightweight processes/tasks/fibers) correctly without concurrent access to a memory region.

Experiment roundup:

Several million messages are scheduled to several "Actor" simulating classes. Message processing is simulated by reading and writing the private, actor-local state in random order. There are more Actors (24-8000) than threads (6-8). Note that results established (if any) will also hold true for other light-weight concurrency schemes like go-routines, fibers, tasks ...

The test is done with

ThreadPoolExecutor
WorkStealingExecutor
Dedicated Thread (Each Actor has a fixed assignment to a worker thread)

Simulating an Actor accessing local state:

Full Source of Benchmark

Suspection:

As ThreadPoolExecutor and WorkStealingExecutor schedule each message on a random Thread, they will produce more cache misses compared to pinning each actor onto a fixed thread. Speculation is, that work stealing cannot make up for the costs provoked by cache misses.

(Some) Variables:

Number of worker threads
Number of actors
Amount of work per message
Locality / Size of private unshared actor state

8 Threads 24 actors 100 memory accesses (per msg)

Interpretation:

For this particular load, fixed assigned threads outperform executors. Note: the larger the local state of an actor, the higher the probability of a prefetch fail => cache miss. In this scenario my suspection holds true: Work stealing cannot make up for the amount of cache misses. fixed assigned threads profit, because its likely, some state of a previously processed message resides still in cache once a new message is processed on an actor.
Its remarkable how bad ThreadpoolExecutor performs in this experiment.

This is a scenario typical for backend-type service: There are few actors with high load. When running a front end server with many clients, there are probably more actors, as typically there is one actor per client session. Therefor lets push up the number of actors to 8000:

8 Threads 8000 actors 100 memory accesses (per msg)

Interpretation:

With this amount of actors, all execution schemes suffer from cache misses, as the accumulated size of 8000 actors is too big to fit into L1 cache. Therefore the cache advantage of fixed-assigned threads ('Dedicated') does not make up for the lack of work stealing. Work Stealing Executor outperforms any other execution scheme if a large amount of state is involved.

This is a somewhat unrealistic scenario as in a real server application, client request probably do not arrive "round robin", but some clients are more active than others. So in practice I'd expect "Dedicated" will at least have some advantage of higher cache hits. Anyway: when serving many clients (stateful), WorkStealing could be expected to outperform.

Just to get a third variant: same test with 240 actors:

These results complete the picture: with fewer actors, cache effect supercede work stealing. The higher the number of actors, the higher the number of cache misses gets, so work stealing starts outperforming dedicated threads.

Modifying other variables

Number of memory accesses

If a message-processing does few memory accesses, work stealing improves compared to the other 2. Reason: fewer memory access means fewer cache misses means work stealing gets more significant in the overall result.

************** Worker Threads:8 actors:24 #mem accesses: 20
local state bytes: 64 WorkStealing avg:505
local state bytes: 64 ThreadPool avg:2001
local state bytes: 64 Dedicated avg:557
local state bytes: 256 WorkStealing avg:471
local state bytes: 256 ThreadPool avg:1996
local state bytes: 256 Dedicated avg:561
local state bytes: 2000 WorkStealing avg:589
local state bytes: 2000 ThreadPool avg:2109
local state bytes: 2000 Dedicated avg:600
local state bytes: 4000 WorkStealing avg:625
local state bytes: 4000 ThreadPool avg:2096
local state bytes: 4000 Dedicated avg:600
local state bytes: 32000 WorkStealing avg:687
local state bytes: 32000 ThreadPool avg:2328
local state bytes: 32000 Dedicated avg:640
local state bytes: 320000 WorkStealing avg:667
local state bytes: 320000 ThreadPool avg:3070
local state bytes: 320000 Dedicated avg:738
local state bytes: 3200000 WorkStealing avg:1341
local state bytes: 3200000 ThreadPool avg:3997

local state bytes: 3200000 Dedicated avg:1428

Fewer worker threads

Fewer worker threads (e.g. 6) increase probability of an actor message being scheduled to the "right" thread "by accident", so cache miss penalty is lower which lets work stealing perform better than "Dedicated" (the fewer threads used, the lower the cache advantage of fixed assigned "Dedicated" threads). Vice versa: if the number of cores involved increases, fixed thread assignment gets ahead.

Worker Threads:6 actors:18 #mem accesses: 100
local state bytes: 64 WorkStealing avg:2073
local state bytes: 64 ThreadPool avg:2498
local state bytes: 64 Dedicated avg:2045
local state bytes: 256 WorkStealing avg:1735
local state bytes: 256 ThreadPool avg:2272
local state bytes: 256 Dedicated avg:1815
local state bytes: 2000 WorkStealing avg:2052
local state bytes: 2000 ThreadPool avg:2412
local state bytes: 2000 Dedicated avg:2048
local state bytes: 4000 WorkStealing avg:2183
local state bytes: 4000 ThreadPool avg:2373
local state bytes: 4000 Dedicated avg:2130
local state bytes: 32000 WorkStealing avg:3501
local state bytes: 32000 ThreadPool avg:3204
local state bytes: 32000 Dedicated avg:2822
local state bytes: 320000 WorkStealing avg:3089
local state bytes: 320000 ThreadPool avg:2999
local state bytes: 320000 Dedicated avg:2543
local state bytes: 3200000 WorkStealing avg:6579
local state bytes: 3200000 ThreadPool avg:6047
local state bytes: 3200000 Dedicated avg:6907

Machine tested:

(real cores no HT)

$ lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    1
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Stepping:              2
CPU MHz:               3067.058
BogoMIPS:              6133.20
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     1,3,5,7,9,11
NUMA node1 CPU(s):     0,2,4,6,8,10

Conclusion

Performance of executors depends heavy on use case. There are work loads where cache locality dominates, giving an advantage of up to 30% over Work-Stealing Executor
Performance of executors varies amongst different CPU types and models (L1 cache size + cost of a cache miss matter here)
WorkStealing could be viewed as the better overall solution. Especially if a lot of L1 cache misses are to be expected anyway.
The ideal executor would be WorkStealing with a soft actor-to-thread affinitiy. This would combine the strength of both execution schemes and would yield significant performance improvements for many workloads
Vanilla thread pools without work stealing and actor-to-thread affinity perform significantly worse and should not be used to execute lightweight processes.

Source of Benchmark

32 comments:

ROHIT RAJWANIMarch 26, 2020 at 8:57 AM
Gelbooru is the website created by Japanese hentai, this website includes millions of pictures of Japanese children. This website has millions of visitors coming and watching these pictures. In addition, it has its own website to address various category forms with different outcomes. Anyone can create an account here, and access free photos to create a free user account to sign up.
ReplyDelete
Replies
Ojash YadavJune 6, 2020 at 11:22 AM
putlockers one of the great streaming systems, in which you can watch special films and tv series. Moreover, you may down load various motion pictures of your choice and watch it while you want to observe it subsequent.

At a time, the channels have been close down and plenty of humans had been no longer able to watch the favorite films in their desire. Because of the lack of ability to benefit get right of entry to to their internet site, many human beings have to look for the alternative. Incidentally, there are plenty of alternatives available for you.

You can test the numerous alternatives supplied to you here and use any of them to watch movies on putlocker , tv series, and most significantly, you can down load films of your preference.

You can watch on line movies and also you should make certain that you are secure while watching it. To be safe way that the film website will now not be harm on your machine with viruses. This is one of the risks users encounter in the sort of web page. The possibility of encountering viruses is constantly there.

Before you start to get right of entry to that site, you must make sure that you do not disclose your PC or your telephone or any of the devices you want to watch the video. If you need to download movies, you ought to additionally use sturdy antivirus to ensure which you do no longer down load any film that can damage your machine.

This may be very vital, specifically in case you want to benefit get right of entry to to a free website like maximum of them which might be encouraged here.

Furthermore, to make certain which you live safe, you ought to make certain that you examine the website's guidelines and do now not cross contra to the regulations put in area. Ensure that you do not destroy any rule you can get prosecuted in case you go towards the policies.
ReplyDelete
Replies
niramayDecember 10, 2020 at 11:23 AM
fouad whatsapp

fouad whatsapp apk
ReplyDelete
Replies
Nespresso Coffee LoversSeptember 3, 2021 at 9:33 PM
I am reading this article about the experiment cache effects thanks sharing this article this good information GTA city

ReplyDelete
Replies
Cheap Assignment WritersSeptember 28, 2021 at 5:11 AM
Good blog. Keep sharing. I love them Are you also searching for Cheap assignment help? we are the best solution for you. We are best known for delivering cheap assignments to students without having to break the bank
ReplyDelete
Replies
Cheap Assignment WritersSeptember 28, 2021 at 5:11 AM
Reading your blogs is therauptic. Keep sharing. I love them Are you also searching for Assignment Help UAE? we are the best solution for you. We are best known for delivering cheap assignments to students without having to break the bank
ReplyDelete
Replies
Sophia RobotSeptember 30, 2021 at 1:40 PM
Hello Sir I saw your blog, It was very nice blog, and your blog content is awesome, i read it and i am impressed of your blog, i read your more blogs, thanks for share this summary.
Learn to Recover Disabled Facebook Account
ReplyDelete
Replies
UnknownOctober 3, 2021 at 9:25 AM
I like your experiment idea. Can Use it for my store that hosted online?
ReplyDelete
Replies
KeygenPcDecember 14, 2021 at 3:08 PM
I like your experiment idea. Can Use it for my store that hosted online?

cracks Download and cracks Website Free crack Download cracks Website Free
ReplyDelete
Replies
AmeliaFebruary 1, 2022 at 11:00 AM
Nice post. I was checking constantly this blog and I’m impressed! Extremely useful info specially the last part I care for such information a lot. I was seeking this certain info for a long time. Thank you and good luck.scheduling AC repairs
ReplyDelete
Replies
WebsiteFebruary 9, 2022 at 2:34 PM
Great and helpful article!
ReplyDelete
Replies
AnabelFebruary 21, 2022 at 8:41 PM
Turbo VPN Mod APK is an application based on the Android operating system. Where users can browse the virtual world without limits based on the location applied by the country or the website.
ReplyDelete
Replies
HashimApril 5, 2022 at 12:01 PM
I am very happy to read this article. Thanks for giving us Amazing info. Fantastic post.
Thanks For Sharing such an informative article, Im taking your feed also, Thanks.construct 3 cracked
ReplyDelete
Replies
cafecrcakMay 8, 2022 at 6:52 PM
Dragonframe CrackDragonframe Crack is a stop-motion energy show that has been used to make some
Subverse Download blockbuster movies, including Disney’s Frankenweenie, Laika’s Coraline, The Boxtrolls, and ParaNorman. It is the same method that is used to broadcast adobe acrobat pro crackscenes from development stops in live action movies, for example Star Wars avid media composer crack
ReplyDelete
Replies
VstsearchJune 10, 2022 at 8:20 PM
This was achieved under the supervision of the original developers and consultants. vstsearch korg legacy collection We bring you the authentic analog experience that only KORG can offer. pluginbeasts vsdc video editor pro For analog devices, KORG’s Component Modeling Technology cloudvst vsdc video editor pro updates analog’s unique sense of organicity and unpredictability. high-pitched voices clickbeautytips best electric toothbrush with uv sanitizer Korg engineers carefully studied the original synthesizer documentation and source code monitorpapa best ips monitor under 200
ReplyDelete
Replies
License PcJune 20, 2022 at 10:04 AM
ApowerManager Serial Key is a beneficial phone management software that contains the set of tools for managing the number of files,Hide.Me VPN Patch messages, contact lists, videos, pictures, audio, etc. Undoubtedly, it defines the best SoundPad Keytechnique to move and remove unnecessary content and make your Android fast for the next job.Sibelius For Mac Crack There is enough capacity to manage the variety of devices without considering data loss issues.
ReplyDelete
Replies
UnknownDecember 12, 2022 at 5:57 AM
심심출장안마
남양주출장안마
의정부출장안마
제천출장안마
횡성출장안마
충주출장안마
부천출장안마

ReplyDelete
Replies
GTA GangFebruary 13, 2023 at 9:09 AM
GTA San Andreas
Grand Theft Auto Vice City
ReplyDelete
Replies
HealthywealthyApril 12, 2023 at 12:20 PM
This comment has been removed by the author.
ReplyDelete
Replies
casinosite.oneNovember 12, 2023 at 12:17 PM
I hope to see more post from you.
ReplyDelete
Replies
casinosite777.topNovember 12, 2023 at 12:20 PM
You are really a talented person I have ever seen.
ReplyDelete
Replies
casinositeguide.comNovember 12, 2023 at 12:20 PM

Such a nice post.
ReplyDelete
Replies
safetotosite.proNovember 12, 2023 at 12:20 PM
Thanks for sharing such a valuable features and other relevant information.
ReplyDelete
Replies
casinositerankNovember 20, 2023 at 9:51 AM
Great blog, thank you so much for sharing with us.
ReplyDelete
Replies
bacarasiteNovember 20, 2023 at 9:52 AM
I am always looking forward on your work and I think that you're always doing an excellent job!!
ReplyDelete
Replies
sportstoto365November 20, 2023 at 9:53 AM
You could certainly see your enthusiasm in the article you write.
ReplyDelete
Replies
gostopsiteNovember 20, 2023 at 9:53 AM
I spend a lot of time on this blog to learn a lot of good information.
ReplyDelete
Replies
outlookindiaNovember 20, 2023 at 9:54 AM
I hope you prosper a lot and please post good comments often. I come often.
ReplyDelete
Replies
AnonymousNovember 28, 2023 at 7:45 AM
Immerse yourself in Healing Buddha's holistic methodology, weaving ancient wisdom with modern techniques for profound transformation.

pranic healing
ReplyDelete
Replies
Emma watsonJanuary 22, 2024 at 8:26 AM
In the digital age, where communication is primarily conducted through various messaging platforms,How to propose a girl on chat has become a common and accepted practice. However, the art of proposing online requires finesse, creativity, and a deep understanding of the other person. In this comprehensive guide, we will explore the steps and strategies to make your online proposal memorable and meaningful.
ReplyDelete
Replies
조조콜걸 카오스콜걸February 24, 2024 at 2:01 PM
단밤콜걸
콜걸
연천콜걸
성남콜걸
김포콜걸
경기광주콜걸
광명콜걸
군포콜걸
ReplyDelete
Replies
Uzmandan BilgilerMarch 12, 2024 at 12:06 AM
Wonderful blog! Many thanks for generously sharing with us. Yeni Medya

ReplyDelete
Replies

Add comment