fdt: Fix fdt_pack_reg() on 64-bit platforms

When "memory" node is being processed in fdt_pack_reg() on ARM64
platforms, an unaligned bus access might happen, which leads to
"synchronous abort" CPU exception. Consider next dts example:

    / {
        #address-cells = <2>;
        #size-cells = <1>;

        memory@80000000 {
            device_type = "memory";
            reg = <0x0 0x80000000 0x3ab00000>,
                  <0x0 0xc0000000 0x40000000>,
                  <0x8 0x80000000 0x80000000>;
        };
    };

After fdt_pack_reg() reads the first addr/size entry from such memory
node, the "p" pointer becomes 12 bytes shifted from its original value
(8 bytes for two address cells + 4 bytes for one size cell). So now it's
not 64-bit aligned, and an attempt to do 64-bit bus access to that
address will cause an abort like this:

    "Synchronous Abort" handler, esr 0x96000021, far 0xba235efc

This issue was originally reported by David Virag [1] who observed it
happening on Samsung Exynos7885 SoC (ARM64), and later the same issue
was observed on Samsung Exynos850 (ARM64).

Fix the issue by using put_unaligned_be64() helper, which takes care of
possible unaligned 64-bit accesses. That solution was proposed by Simon
Glass in the original thread [1].

[1] https://lists.denx.de/pipermail/u-boot/2023-July/522074.html

Fixes: 739a01ed8e02 ("fdt_support: fix an endian bug of fdt_fixup_memory_banks")
Suggested-by: Simon Glass <sjg@google.com>
Reported-by: David Virag <virag.david003@gmail.com>
Closes: https://lists.denx.de/pipermail/u-boot/2023-July/522074.html
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Reviewed-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
diff --git a/boot/fdt_support.c b/boot/fdt_support.c
index 090d82e..9844c70 100644
--- a/boot/fdt_support.c
+++ b/boot/fdt_support.c
@@ -17,6 +17,7 @@
 #include <linux/ctype.h>
 #include <linux/types.h>
 #include <asm/global_data.h>
+#include <asm/unaligned.h>
 #include <linux/libfdt.h>
 #include <fdt_support.h>
 #include <exports.h>
@@ -421,13 +422,13 @@
 
 	for (i = 0; i < n; i++) {
 		if (address_cells == 2)
-			*(fdt64_t *)p = cpu_to_fdt64(address[i]);
+			put_unaligned_be64(address[i], p);
 		else
 			*(fdt32_t *)p = cpu_to_fdt32(address[i]);
 		p += 4 * address_cells;
 
 		if (size_cells == 2)
-			*(fdt64_t *)p = cpu_to_fdt64(size[i]);
+			put_unaligned_be64(size[i], p);
 		else
 			*(fdt32_t *)p = cpu_to_fdt32(size[i]);
 		p += 4 * size_cells;