Shtifting bug explanation (#134)

dreamos82 · web-flow · commit 20cdab9b6f46 · 2026-01-01T20:48:46.000+11:00
* Add small paragraph to explaine the bitwise operation issue

* Mention that heap merging is also called coalesing

* Fix formula syntax in tar fs chapter

* Minor fix on the literals chapter

* fix typo in tar header table

* Fix typos in tar chapter

* Update tar header section

* Add introduction to coalescing term

* Minor typo fixes on tar chapter

* Fix typos in memory management and tar chapters

* Changes requested
diff --git a/04_Memory_Management/01_Overview.md b/04_Memory_Management/01_Overview.md
@@ -2,9 +2,9 @@
 
 Welcome to the first challenge of our osdev adventure! Memory management in a kernel is a big area, and it can easily get very complex. This chapter aims to breakdown the various layers you might use in your kernel, and explain how each of them is useful.
 
-The design and complexity of a memory manger can vary greatly, a lot depends on what the operating system is designed, and its specific goals. For example if only want mono-tasking os, with paging disabled and no memory protection, it will probably be fairly simple to implement.
+The design and complexity of a memory manger can vary greatly, a lot depends on how the operating system is designed, and its specific goals. For example if only want mono-tasking os, with paging disabled and no memory protection, it will probably be fairly simple to implement.
 
-In this part we will try to cover a more common use case that is probably what nearly all modern operating system uses, that is a 32/64 operating system with paging enabled, and various forms of memory allocators for the kernel and one for user space.
+In this part we will try to cover a more common use case that is probably what nearly all modern operating system uses, that is a 32/64i bits operating system with paging enabled, and various forms of memory allocators for the kernel and one for user space.
 
 In the appendices there is also an additional section on memory protection features available in some CPUs.
 
diff --git a/04_Memory_Management/05_Heap_Allocation.md b/04_Memory_Management/05_Heap_Allocation.md
@@ -87,7 +87,6 @@ So now we have the following situation:
 Now the third `alloc()` call will work similarly to the others, and we can imagine the results. `
 
 What we have so far is already an allocation algorithm, that's easy to implement and very fast!
-Its implementation is very simple:
 
 ```c
 uint8_t *cur_heap_position = 0; //Just an example, in the real world you would use
@@ -308,11 +307,14 @@ struct {
 
 That's it! That's what we need to clean up the code and replace the pointers in the latest with the new struct reference. Since it is just matter of replacing few variables, implementing this part is left to the reader.
 
-### Part 5: Merging
+### Part 5: Coalescing (Merging)
 
 So now we have a basic memory allocator (woo hoo), and we are nearing the end of our memory journey.
 
 In this part we'll see how to help mitigate the *fragmentation* problem. It is not a definitive solution, but this lets us reuse memory in a more efficient way. Before proceeding let's recap what we've done so far.
+
+This solution is known with the name _Coalescing_, and it simply is an algorithm that merge contiguous smaller block of free memory into a bigger one. 
+
 We started from a simple pointer to the latest allocated location, and added information in order to keep track of what was previously allocated and how big it was, needed to reuse the freed memory.
 
 We've basically created a list of memory regions that we can traverse to find the next/prev region.
@@ -335,7 +337,7 @@ What the heap will look like after the code above?
 |  6 | F  | X  |  ..  |  X  | 6  | F  |  X | .. | X  | 6  | F  | .. | X  |    |    |
 
 
-Now, all of the memory in the heap is available to allocate (except for the overhead used to store the status of each chunk), and everything looks perfectly fine. But now the code keeps executing, and it will arrive at the following instruction:
+Now, all of the memory in the heap is available to allocate (except for the overhead used to store the status of each chunk), and everything looks perfectly fine. But the code keeps executing, and it will arrive at the following instruction:
 
 ```c
 alloc(7);
diff --git a/08_VirtualFileSystem/03_TarFileSystem.md b/08_VirtualFileSystem/03_TarFileSystem.md
@@ -33,14 +33,15 @@ As anticipated above, the header structure is a fixed size struct of 512 bytes.
 | 148 | 8 	| Checksum for header record |
 | 156 | 1 	| Type flag |
 | 157 | 100 | Name of linked file |
-| 57  | 6 	| UStar indicator, "ustar", then NULL |
+| 257  | 6 	| UStar indicator, "ustar", then NULL |
 | 263 | 2 	| UStar version, "00" (it is a string) |
 | 265 |	32 	| Owner user name |
 | 297 |	32 	| Owner group name |
 | 329 |	8 	| Device major number |
 | 337 |	8 	| Device minor number |
 | 345 |	155 | Filename prefix |
 
+The sum of all sizes, anyway is not 512 bytes, but 500, so the extra space is filled with zerosextra space is filled with _0s_.
 To ensure portability all the information on the header are encoded in `ASCII`, so we can use the `char` type to store the information into those fields. Every record has a `type` flag, that says what kind of resource it represent, the possible values depends on the type of tar we are supporting, for the `ustar` format the possible values are:
 
 | Value | Meaning |
@@ -57,9 +58,9 @@ The _name of linked file_ field refers to symbolic links in the unix world, when
 
 The USTar indictator (containing the string `ustar` followed by NULL), and the version field are used to identify the format being used, and the version field value is "00".
 
-The `filename prefix` field, present only in the `ustar`, this format allows for longer file names, but it is splitted into two parts the `file name` field ( 100 bytes) and the `filename prefix` field (155 bytes)
+The `filename prefix` field is present only in the `ustar`, this format allows for longer file names, but it is splitted into two parts the `file name` field ( 100 bytes) and the `filename prefix` field (155 bytes)
 
-The other fields are either self-explanatory (like uid/gid) or can be left as 0 (TO BE CHECKED) the only one that needs more explanation is the `file size` field because it is expressed  as an octal number encoded in ASCII. This means we need to convert an ascii octal into a decimal integer. Just to remind, an `octal` number is a number represetend in base 8, we can use digits from 0 to 7 to represent it, similar to how binary (base 2) only have 0 and 1, and hexadecimal (base 16) has 0 to F. So for example:
+The other fields are either self-explanatory (like uid/gid) or can be left as 0 the only one that needs more explanation is the `file size` field because it is expressed as an octal number encoded in ASCII. This means we need to convert an ascii octal into a decimal integer, with the exception of the last byte (12th) because this is historically left as `NULL` (0). Just to remind, an `octal` number is a number represetend in base 8, we can use digits from 0 to 7 to represent it, similar to how binary (base 2) only have 0 and 1, and hexadecimal (base 16) has 0 to F. So for example:
 
 ```
 octal 12 = hex A = bin 1010
@@ -69,7 +70,7 @@ In C an octal number is represented adding a `0` in front of the number, so for
 
 But that's not all, we also have that the number is represented as an `ascii` characters, so to get the decimal number we need to:
 
-1. Convert each ascii digit into decimal, this should be pretty easy to do, since in the ascii table the digits are placed in ascending order starting from 0x30 ( `´0'` ), to get the digit we need just to subtract the `ascii` code for the 0 to the char supplied
+1. Convert each ascii digit into decimal, this should be pretty easy to do, since in the ascii table the digits are placed in ascending order starting from 0x30 ( `'0'` ), to get the digit we need just to subtract the `ascii` code for the 0 to the char supplied
 2.  To obtain the decimal number from an octal we need to multiply each digit per `8^i` where i is the digit position (rightmost digit is 0) and sum their results. For example 37 in octal is:
 
 ```c
@@ -97,9 +98,9 @@ The picture below show how data is stored into a tar archive.
 
 To move from the first header to the next we simply need to use the following formula:
 
-$$ next\_header = header\_ptr + header\_size + file\_size $$
+$$ next\_{header} = header\_{ptr} + header\_{size} + file\_{size} $$
 
-The lookup function then will be in the form of a loop. The first thing we'll need to know is when we've reached the end of the archive. As mentioned above, if there are two or more zero-filled records, it indicated the end. So while searching, we need to make sure that we keep track of the number of zeroed records. The main lookup loop should be similar to the following pseudo-code:
+The lookup function then will be in the form of a loop. The first thing we'll need to know is when we've reached the end of the archive. As mentioned above, if there are two or more zero-filled records, it indicates the end. So while searching, we need to make sure that we keep track of the number of zeroed records. The main lookup loop should be similar to the following pseudo-code:
 
 ```c
 int zero_counter = 0;
@@ -189,18 +190,18 @@ In our scenario there is no really need to close a file from a fs driver point o
 
 ## And Now from A VFS Point Of View
 
-Now that we have a basic implementation of the tar file system we need to make it accessible to the VFS layer. To do we need to do two things: load the filesystem into memory and populate at least one mountpoint_t item. Since technically there are no fs loaded yet we can add it as the first item in our list/array. We have seent the `mountpoint_t` type already in the previous chapter, but let's review what are the fields available in this data structure:
+Now that we have a basic implementation of the tar file system we need to make it accessible to the VFS layer. To do it we need to do two things: load the filesystem into memory and populate at least one `mountpoint_t` item. Since technically there are no fs loaded yet we can add it as the first item in our list/array. We have seen the `mountpoint_t` type already in the previous chapter, but let's review what are the fields available in this data structure:
 
 * The file system name (it can be whatever we want).
 * The mountpoint (is the folder where we want to mount the filesystem), in our case since we have not mountpoints loaded, a good idea will be to mount it at "/".
-* The file_operations field, that will contain the pointer to the fs functions to open/read/close/write files, in this field we are going to place the fs driver function we just created..
+* The `file_operations` field, that will contain the pointer to the fs functions to open/read/close/write files, in this field we are going to place the fs driver function we just created..
 
-The file_operation field will be loaded as follows (this is according to our current implementation):
+The `file_operations` field will be loaded as follows (this is according to our current implementation):
 
-* The open function will be the ustar_open function.
-* The read function will be the ustar_read function.
-* We don't need a close function since we can handle it directly in the VFS, so we will set it to NULL.
-* As well as we don't need a write function since our fs will be read only, so it can be set to NULL.
+* The `open` function will be the `ustar_open` function.
+* The `read` function will be the `ustar_read` function.
+* We don't need a `close` function since we can handle it directly in the VFS, so we will set it to NULL.
+* As well as we don't need a `write` function since our fs will be read only, so it can be set to NULL.
 
 Loading the fs in memory instead will depend on the booting method we have chosen, since every boot manager/loader has its different approach this will be left to the boot manager used documentation.
 
@@ -258,7 +259,7 @@ struct tar_list_item {
 
 And using the new datatype initialize the list accordingly.
 
-Now when the file system is accessed for the first time we can initialize this list, and use it to search for the files, saving a lot of time and resources, and it can makes things easier to for the lookup and read function.
+Now when the file system is accessed for the first time we can initialize this list, and use it to search for the files, saving a lot of time and resources, and it can makes things easier for the lookup and read function.
 
 Another limitation of our driver is that it expects for the tar to be fully loaded into memory, while we know that probably file system will be stored into an external device, so a good idea is to make the driver aware of all possible scenarios.
 
diff --git a/99_Appendices/C_Language_Info.md b/99_Appendices/C_Language_Info.md
@@ -96,6 +96,31 @@ It is worth mentioning that inline assembly syntax is the At&t syntax, so the us
 asm("movl $5, %rcx;");
 ```
 
+## Dealing With Literals and Bitwise Operation
+
+There are some subtle bugs that can be encountered when when using immediate values in C, due to operator precedence and integer promotion rules.
+
+Let's imagine we have a 64 bit variable, and we need to do a bitwise operation like `setting` the bit at the position `x`, this is easily achieved using the _left shift_ (`<<`) operator combined with a _or_ (`|=`), like in the following example: 
+
+```
+uint64_t example_var |= (1 << x);
+```
+
+We make few tests, for `x=1, 2, 10, 20, 31`, everything works fine, so what is the issue? The issue is when the shift is above 31, because of the C _Integer promotion rule_.
+
+In the above example, `1` is a literal, and by default C converts it to `int`, the bitwise operation is executed using the type of the left operand, so we are trying to shift left a bit of a lower size type by a number of positions that is higher than than the size of the variable, causing an undefined behavior.
+
+Then what are the solutions? Below few example of how to potentially fix it: 
+
+```c
+#define ONE 1ULL
+const uint64_t one = 1;
+
+uint64_t example_one |= one << 42;
+uint64_t example_two |= ONE << 42;
+uint64_t example_three |= 1ULL << 42;
+```
+
 ## C +(+) assembly together - Calling Conventions
 
 Different C compilers feature a number of [calling conventions](https://en.wikipedia.org/wiki/X86_calling_conventions),
diff --git a/99_Appendices/J_Updates.md b/99_Appendices/J_Updates.md
@@ -66,3 +66,4 @@ Sixth Book Release
 * _Stivale 2_ protocol sections have been replaced with Limine protocol, since _stivale2_ has been deprecated.
 * Add a complete exammple of how to create an ELF executable for our kernel
 * Typo and error fixes
+* New short paragraph to explain the behaviour of literals with bitwise operators.