Apple M1: architecture, news and features

On this website, we have discussed all types of processors, but usually these are the ones that are compatible with the x86 register and instruction set, but due to controversy in recent months with Apple M1 we decided to write an article about its architecture.

Architecture Apple M1

Apple M1 is not a processor, it is a SoC

Apple M1

The first thing to keep in mind is that Apple M1 is not a CPU like Intel or AMD, but it is a full-fledged SoC, which, in addition to the CPU, includes a number of specialized modules of various categories and utilities, namely:

  • CPU, which we’ll talk about later in this article.
  • GPU that processes graphics.
  • Imaging unit or internet service provider.
  • A Digital Signal Processor or DSP that is used for decompressing music files as well as for very complex mathematical operations.
  • Neural Processing Unit, a processor dedicated to AI.
  • Video encoder and decoder for playing and storing movies.
  • Data encryption blocks for security.
  • I / O blocks that control external peripherals and the information that is sent to them.
  • The large last-level cache required for unified memory is called the system-level cache.

If we were talking about all these modules, we would need a book, so we will only talk about the CPU to answer the question about its performance in relation to the CPUs that are in the PC.

When there is no variety in hardware, it is easier to optimize programs

Interior PC

One of the differences between PCs and other platforms is that each component has a thousand different products and therefore an incredible amount of configurations end up being created, on the other hand, with computers Applestarting from M1, all hardware except RAM and storage are on Apple SoC.

What does this allow? Well, this basically allows you to optimize applications for a single configuration, which is no different from what happens in the console, which has been on the market for several years and ultimately optimizes the code even five years after its release. On the other hand, in a PC, versatility in selection means that nothing can be optimized.

WiFi speed: how to measure upload, download and latency speeds

On a PC, when we execute a program, everything will be pushed to the CPU, but perhaps there is some piece of code that would be nice to be able to execute it in units much more specialized than the CPU, but the huge variety of hardware in PCs will optimize the code to use other hardware blocks to speed up programs in the Sisyphus task.

Single memory

Apple M1 + RAM

One of the secret weapons Apple versus PC – it is a single memory, but first of all we must clarify that a single memory does not refer to the fact that different elements use the same memory at the physical level, but this single memory means that all SoC elements understand memory the same way.

That is, when the GPU / GRAPHIC PROCESSOR changes the memory address, this data is changed directly for the rest of the elements. Apple M1 at the same memory address. In PCs and derived architectures that use unified memory, it is even necessary to use DMA blocks that copy data from the RAM space assigned to one block to another block, which increases the latency in code execution and reduces the possibility of collaboration between parties. …

So, thanks to the M1 unified memory, macOS developers can run some code in units that resolve it faster than the processor.

High performance processor Apple M1: Firestorm

ARM Laptop

Apple The M1, despite being a multi-core CPU, actually uses two different types of CPUs. On the one hand, a high-performance, but worse core called Icestorm, and on the other, high-performance, but less energy-efficient cores called Firestorm, which we are going to deal with, since they belong to Apple opposed to x86. high performance.

It is in the Firestorm cores that we consider that in Apple M1 has only four cores, and it is with them Apple decided to confront the high-performance processors on the PC, and did so with a high-performance core, which, in order to understand the reason for its performance, before we comment on a topic that is common to all processors.

Motorcycle Games for Mac: Simulation, Competition & More

Decoders on processors are out of order

NPU Render Processor

The first step of the second phase of the instruction loop converts the instructions into microinstructions, which are much simpler but easy to implement on silicon. A microinstruction itself is not a complete instruction due to the fact that it does not represent an action, but some of them combine to form more complex instructions.

Hence, internally, no CPUs execute the program binary as it is, but each CPU has a process of converting instructions into micro-instruction sets. But the matter does not end there, in a modern processor the execution is out of order, which means that the program is executed not in the order of the sequence, but in the order in which the execution units are available.

Thus, the first thing a decoder does after converting an instruction into micro-instructions is to place them in what we call a reordering buffer, in which they are listed as a list in the order in which the various execution units will be executed. be. are available next to the position, which are in the correct program order. Thus, the program will work more efficiently, and the instructions do not have to wait until the execution unit is released, then the result will be written in the correct order of the program.

Firestorm Cores Secret Weapon Apple M1: its decoder

Velosidad Reloy

The instruction decoding phase is the second phase of the instruction cycle. In any processor that works in parallel, the decoder must be able to process multiple instructions at the same time and send them to the appropriate execution units for solution.

The M1 advantage? The fact of having a decoder capable of handling 8 simultaneous instructions, which makes it the widest processor in this regard, since it allows it to process more instructions in parallel, and also allows Apple post more instructions. But the reason why Apple was able to do this has to do with the nature of the ARM instruction set versus x86, especially when it comes to decoding.

ARM x86

ARM instructions have the advantage of being fixed in size, which means that in binary, every number of bits is an instruction. On the other hand, x86 are variable in size. This means that the code must go through several decoders before it becomes a microinstruction. The consequences of this? Well, the fact that the piece of hardware dedicated to decode instructions not only takes up much more space and consumes more, but fewer simultaneous instructions can be decoded at the same size.

How to recover deleted photos from Instagram on Android or iPhone

And here we see the huge advantage of the M1. How many complete decoders do Intel and AMD processors have? Well, on average four, only half. This gives the M1 Firestorms the ability to execute twice as many instructions simultaneously as Intel and AMD processors.

Apple M1 vs. Intel and AMD

Apple M1 vs. Intel vs. AMD

Executing twice as many instructions does not mean solving twice as many instructions, the analogue of ARM-based cores is that they require more single instruction cycles and therefore clock cycles to execute a program. Thus, an x86 with the same width will be much more powerful than ARM, but it will require more transistors and a very complex processor in terms of size.

Over time, both AMD and Intel will increase the IPC of their processors, but they are limited by the complexity of the x86 instruction set and its decoder. It’s not that they can’t make an x86 processor with eight decoders, but that if it existed it would be too big to be commercially viable and they would have to wait for new nodes to appear before increasing IPC per core.

Related Articles

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker. Thanks.