TheCosmicStaff Kernel For M21
Features
- Introduce Dynamic SchedTune Boost
- Introduce SchedTuneAssist
- Introduce CPUSetsAssist
- ThinLTO
- Sultan’s Simple Low Memory Killer
- BBR Tcp congenstion algorithm
- fq_codal network scheduler(qdisc)
- Enable powerefficient workqueues OOTB
- Add PowerSuspend Driver
- Built with Dead code elimination enabled
- Enabled BLK cgroup IO throttling
- Silence unwanted logging
- Enabled Gentle Fair Sleepers
- Affine important userspace components to arm Big cores
- Introduce Srandom
- Sched imrpovements
- UFS IO improvement
- Grahpics Perf improvement
- Power Efficiency improvement
- Optimized Console FrameBuffer
- 100HZ timer
- Boeffla Wakelock blocker
- ARM64 targeted optimizations
- IO_STAT disabled out of the box
- Kill unwated google services
- Disable Audit treewide
- Make use of cacheline alignment
- Make workqueue high priority
- Tweak cpuidle governor for better battery life
- 3x faster integer sqrt
- Enable Write Back Caching
- Enable I/O throttling
- disable crc checks (30% IO speed improvement)
- set dirty ratio to 40%
- add NEON accelerated XOR implementation
Changelog
R6:
- Introduce CPUSetsAssist
- ThinLTO (perf improvement)
- Introduce Sultan’s Simple Low Memory Killer
- BBR Tcp congenstion algorithm
- fq_codal network scheduler(qdisc)
- Enable powerefficient workqueues OOTB
- Add PowerSuspend Driver (almost eliminate idle battery drain)
- Built with Dead code elimination enabled
- Enabled BLK cgroup IO throttling
- Silence unwanted logging
- Enabled Gentle Fair Sleepers
- Affine hwcomposer to big CPUs (lowered jitter)
- Affine surfaceflinger to big CPUs (smoother ui experience)
- Affine Wi-Fi components to big CPUs (lowered latency)
- Affine ubound workqueues to little CPUs (lowered power consumption)
- Perform PID map reads on the little CPU cluster (lowered power consumption)
- Run nocb kthreads on little CPUs (lowered power consumptio)
- Free dead tasks asynchronously in finish_task_switch() (reduce resource wastage)
- Affine Samsung’s HyPer HAL to the big CPU cluster (quicker system boosting)
- Hide magisk mounts for IsolatedService (Prevent magisk detection)
- Updates to
- Drop top-app level to 3
- Improvements to boeffla Wakelock blocker
- Add support for Gentle_fair_sleepers
- Improvements to rcu
- Improvements to zram module
- Slightly faster booting
- Add state notifier driver (works in conjuction with PowerSuspend)
- Enable per-process reclaim
- Upstream to 4.14.140
R5-noboost:
-
Merge
boost
andnoboost
into one - Bring in some fixes to samsung drivers
- Add support for Dynamic SchedTune Boost
- Add support for dynamic prefer_idle
- Stub out debug prints by default (Graphics perf improvement)
- Reserve caches for small, high-frequency memory allocations (Reduce latency)
- Use power efficient workingqueues for drm (Battery life improvement)
- Use power efficient workingqueues for block (Battery life improvement)
- Minimize number of tasks to load balance
- Add Srandom (5 times faster than urandom)
- Resolve sched_feat() at compile time to improve code optimization (Perf improvement)
- Upstream to Linux 4.14.124
- only force UX tasks to big cores (Power saving)
- Disable UFS debugging (IO perf improvement)
- Use SCHED_RR in place of SCHED_FIFO for all users (UX improvement)
- Change default SCHED_RR timeslice from 100 ms to 1 jiffy (Lowered latency)
- Make zsmalloc copy rather than mapping pages (Memory perf improvement)
- Disable Selinux audit
- Don’t dynamically allocate single-use structs (Reduce overhead)
- Avoid dynamic memory allocation for small write buffers (Reduce overhead)
- Avoid dynamically allocating memory in getxattr (Perf improvement)
- Schedule workers on CPU0 or 0-5 by default (Power saving)
- Fix memory leak for uncached firmware requests
- Speed up cache entry creation (Reduce overhead)
- Avoid dynamic allocations in sel_write_access (Perf improvement)
- Avoid dynamic memory allocation for INITCONTEXTLEN buffers (Reduce overhead)
- Use kmem_cache pool for struct dma_buf_attachment, dmabuf allocations, struct sync_file, sdcardfs_file_info, cgrp_cset_link, kernfs_open_node/file, async_entry (Reduce overhead)
- Always compile core debugfs driver for Android kernels
- Avoid allocating small buffers for map keys and values (Reduce overhead)
- Start killing wakelocks after one minute of idle (Power saving)
- Expedite garbage collection if idle (Power saving)
- Add a timeout to wakelocks globally
- Reduce QoS boosting from Samsung hacks (Power saving)
- Disable full refcount validation (Perf improvement)
- Micro-optimize PID map reads for arm64 while retaining output format (Help with games like genshin impact)
- Stop kswapd early when nothing’s waiting for it to free pages
- Don’t stop kswapd on a per-node basis when there are no waiters
- Move frequently used functions to headers and declare them inline (Reduce overhead)
- Don’t hog RCU read lock while optimistically spinning
- Use interruptible wait for genirq, media v4l (Reduce latency)
- Increase watermark scale factor
- Update LZ4 components
- Reduce latency while processing atomic ioctls (Reduce display rendering latency)
- Drop top-app level to 2
R4-noboost:
- Introduce SchedTune Assist (500% improvement in User experience at minimal cost of battery life)
- Increase the time a task is considered cache-hot (Performance gain)
- Queue requests on their origin CPU (20%-40% gain in some scenarios)
- Do not give sleepers 50% more runtime
- Do not wake idle CPUs to queue same-origin requests (noticeable battery life improvement)
- Do not use IPIs for remote wakeups if idle
- Allow aggressive remote task interruptions
- Do not collect I/O statistics
- Don’t allow userspace to impose restrictions on CPU idle levels (No more random performance drops)
- Implement optimised checksum routine (Peformance gain)
- Network Stack optimizations
- Don’t hog RCU read lock while optimistically spinning
- Speed up ioctl by omitting debug names (overall graphics performance boost)
- optimized memcpy (Performance gain)
- optimized memmove (Performance gain)
- optimized memset (Performance gain)
- Enable BPF JIT (Faster settings activities launch)
- Update default wakelock block list (Better battery life)
- Enable wireguard (add wireguard support for those who might need it)
R4-boost:
- Don’t allow userspace to impose restrictions on CPU idle levels (No more random performance drops)
- Implement optimised checksum routine (Peformance gain)
- Network Stack optimizations
- Don’t hog RCU read lock while optimistically spinning
- Speed up ioctl by omitting debug names (overall graphics performance boost)
- optimized memcpy (Performance gain)
- optimized memmove (Performance gain)
- optimized memset (Performance gain)
- Enable BPF JIT (Faster settings activities launch)
- Update default wakelock block list (Better battery life)
- update boost freq (less heating)
- Change default SCHED_RR timeslice from 100 ms to 1 jiffy
- Use SCHED_RR in place of SCHED_FIFO for all users (lowered latency)
- Micro-optimize PID map reads for arm64 while retaining output format (Should help with certain big games)
- Free dead tasks asynchronously in finish_task_switch() (Performance gain)
- Avoid allocating small buffers for map keys and values
- Avoid dynamic memory allocation for small write buffers (avoid overhead)
- Reduce latency while processing atomic ioctls (lowered latency)
- Enable wireguard (add wireguard support for those who might need it)
R3:
- drop Schedtune assist
- drop dsboost
- revert undervolting
- set timer 300hz
- Boost DDR bus for a short amount of time when zygote forks (faster app launches)
- Use -O3 optimization (slight performance improvement)
- branch optimization in free slowpath
- Align file struct to 8 bytes
- disable crc checks (30% IO speed improvement)
- optimize modulo operation
- set initial TCP window size to 64K (speed improvement)
- use power efficient workingqueues
- add NEON accelerated XOR implementation
- set dirty ratio to 40%
- Increase vmstat interval
- swap pages one at a time (help with extreme memory pressure scenarios)
R2:
- Fix error “There’s an internal problem with your system”
- Enable Write Back Caching (improve flash storage lifespan)
- Don’t fail on wakeup (do not let userspace abuse alarmtimers)
- Reduce the opportunity for sleepers to preempt (reduce jitter)
- Enable I/O throttling (reduce latency)
- Introduce SchedTune Assist (faster app launches and power efficiency)
- Introduce DSBoost driver(smoother experience and lowered jank)
- Introduce devfreq boost driver (butter smooth UI experience, no frame-drops)
- Optimized Console FrameBuffer for upto 70% increase in Performance
- Boost DDR bus upon running an atomic ioctl (prevent any and all framedrops)
- increase limit on sched-tune boost groups
- Undervolt the whole soc (lowered power usage)
- Process new forks before processing their parent (improve app launch speed)
- apply boost before frames are updated
- enable westwood (lowered network latency and better speeds)
- Prefer to reclaim inode and dentry cache over pagecache
- Increase ratelimit pages value (reduce overhead)
R1:
- Tweak cpuidle governor to enter deep C state faster(better battery life)
- set frequency of kernel interrupt to 100hz(noticeable battery life improvement)
- Add boeffla Wakelock blocker
- Disable debugging in binder(lowered latency)
- Improve 3x faster integer sqrt
- disable IO_STAT completely(IO speed improvement)
- ARM64 targeted optimizations
- Make workqueue high priority(Decrease delays under pressure)
- cacheline alignment(Performance benefit)
- Don’t let google crap run in the background(batter battery life)
- Make zsmalloc copy rather than mapping pages(performance improvement)
- Disable audit
- Disable auditing for selinux
Flashing instructions
- Flash
Disable-userspace-slmk
module using magisk manager - Reboot to recovery
- Flash the kernel
- Enjoy
Downlods
Sources
- Kernel source - Click here
Note
I built it for my personal use only, my goals were to achieve better efficiency and not performance
Updated on