NES PPU

From NESdevWiki

Jump to: navigation, search

The NES PPU, or Picture Processing Unit, generates a composite video signal with 240 lines of pixels, designed to be received by a television. When the Famicom chipset was designed in the early 1980s, it was considered quite an advanced 2D picture generator for video games.

It has its own address space, which typically contains 10 kilobytes of memory: 8 kilobytes of ROM or RAM on the Game Pak (possibly more with one of the common Mappers) to store the shapes of background and sprite tiles, plus 2 kilobytes of RAM in the console to store a map or two. Two separate, smaller address spaces hold a palette, which controls which colors are associated to various indices, and OAM (Object Attribute Memory), which stores the position, orientation, shape, and color of the sprites, or independent moving objects. These are internal to the PPU itself and use dynamic memory (which will slowly decay if the PPU is not rendering data).

Contents

[edit] Registers exposed to CPU

The PPU exposes only eight memory-mapped registers to the CPU. These nominally sit at $2000 through $2007 in the CPU's address space, but because they're incompletely decoded, they're mirrored in every 8 bytes from $2008 through $3FFF, so a write to $3456 is the same as a write to $2006.

Immediately after powerup, the PPU must wait at least one full frame before it is stable enough to operate normally. The first thing that should be done on startup is to write a zero byte to registers PPUCTRL and PPUMASK (to disable rendering and NMIs) and then wait for bit 7 of PPUSTATUS to be set twice. Some programs will wait for $2002 bit 7 to be set, initialize hardware other than the PPU (such as zeroing CPU RAM), and then wait for $2002 bit 7 to be set again.

[edit] PPUCTRL ($2000)

Various flags controlling PPU operation (write)

76543210
||||||||
||||||++- Base nametable address
||||||    (0 = $2000; 1 = $2400; 2 = $2800; 3 = $2C00)
|||||+--- VRAM address increment per CPU read/write of PPUDATA
|||||     (0: increment by 1, going across; 1: increment by 32, going down)
||||+---- Sprite pattern table address for 8x8 sprites (0: $0000; 1: $1000)
|||+----- Background pattern table address (0: $0000; 1: $1000)
||+------ Sprite size (0: 8x8; 1: 8x16)
|+------- PPU master/slave select (has no effect on the NES)
+-------- Generate an NMI at the start of the
          vertical blanking interval (0: off; 1: on)

Equivalently, bits 0 and 1 are the most significant bit of the scrolling coordinates (see Nametables below and PPU Scrolling):

76543210
      ||
      |+- 1: Add 256 to the X scroll position
      +-- 1: Add 240 to the Y scroll position

[edit] PPUMASK ($2001)

Screen enable, masking, and intensity (write)

76543210
||||||||
|||||||+- Grayscale (0: normal color; 1: AND all palette entries
|||||||   with 0x30, effectively producing a monochrome display;
|||||||   note that colour emphasis STILL works when this is on!)
||||||+-- Enable background in leftmost 8 pixels of screen (0: clip; 1: display)
|||||+--- Enable sprite in leftmost 8 pixels of screen (0: clip; 1: display)
||||+---- Enable background rendering
|||+----- Enable sprite rendering
||+------ Intensify reds (and darken other colors)
|+------- Intensify greens (and darken other colors)
+-------- Intensify blues (and darken other colors)

[edit] PPUSTATUS ($2002)

PPU status (read)

76543210
||||||||
|||+++++- Unimplemented
||+------ Sprite overflow. The PPU can handle only eight sprites on one
||        scanline and sets this bit if it starts dropping sprites.
||        Normally, this triggers when there are 9 sprites on a scanline,
||        but the actual behavior is significantly more complicated.
|+------- Sprite 0 Hit.  Set when a nonzero pixel of sprite 0 'hits'
|         a nonzero background pixel.  Used for raster timing.
+-------- Vertical blank has started (0: not in VBLANK; 1: in VBLANK)

Reading PPUSTATUS will clear D7 of PPUSTATUS and also the address latch used by PPUSCROLL and PPUADDR.

Caution: Reading PPUSTATUS at the exact start of vertical blank will return a 0 in D7 but clear the latch anyway, causing the program to miss frames. See NMI for details.

[edit] OAMADDR ($2003)

OAM address (write)

Write the address of OAM you want to access here. Most games just write $00 here and then use OAM_DMA ($4014).

This register also seems to affect Sprite 0 Hit, though it is not yet understood exactly how it does. The upper 5 bits of this register seem to select which SPR-RAM data is used for sprites 0 and 1 (instead of the first 8 bytes of SPR-RAM), though actual behavior varies between resets.

[edit] OAMDATA ($2004)

OAM data port (r/w)

Write OAM data here. Writes will increment OAMADDR; reads won't.

Most games access this register through $4014 instead. Reading OAMDATA while the PPU is rendering will expose internal OAM accesses during sprite evaluation and loading; Micro Machines does this.

[edit] PPUSCROLL ($2005)

Scroll register (2x write)

After reading PPUSTATUS to reset the address latch, write the horizontal and vertical scroll offsets here just before turning on the screen:

 bit PPUSTATUS
 ; possibly other code goes here
 lda cam_position_x
 sta PPUSCROLL
 lda cam_position_y
 sta PPUSCROLL

Horizontal offsets range from 0 to 255. "Normal" vertical offsets range from 0 to 239. (Values of 240 to 255 are treated as -16 through -1 in a way, pulling tile data from the attribute table.)

[edit] PPUADDR ($2006)

VRAM address register (2x write)

After reading PPUSTATUS to reset the address latch, write the 16-bit address of VRAM you want to access here, upper byte first. Valid addresses are $0000-$3FFF.

Access to PPUSCROLL and PPUADDR during screen refresh produces interesting raster effects; the starting position of each scanline can be set to any pixel position in nametable memory. For more information, see "The Skinny on NES Scrolling" by loopy, available from the main site.

[edit] PPUDATA ($2007)

VRAM data register (r/w)

When the screen is turned off in PPUMASK or during vertical blank, read or write data from VRAM through this port.

Reads are delayed by one cycle; discard the first byte read. Do not attempt to access this register while the PPU is rendering; if you do, Bad Things™ will happen (i.e. graphical glitches and RAM corruption).

[edit] Pattern tables

There are two pattern tables, one at $0000 and one at $1000. Each tile in the pattern table is 16 bytes, made of two planes. The first plane controls bit 0 of the color; the second plane controls bit 1. Any pixel whose color is 0 is background/transparent (represented by '.' in the following diagram):

Bit Planes            Pixel Pattern
$0xx0=$41  01000001
$0xx1=$C2  11000010
$0xx2=$44  01000100
$0xx3=$48  01001000
$0xx4=$10  00010000
$0xx5=$20  00100000         .1.....3
$0xx6=$40  01000000         11....3.
$0xx7=$80  10000000  =====  .1...3..
                            .1..3...
$0xx8=$01  00000001  =====  ...3.22.
$0xx9=$02  00000010         ..3....2
$0xxA=$04  00000100         .3....2.
$0xxB=$08  00001000         3....222
$0xxC=$16  00010110
$0xxD=$21  00100001
$0xxE=$42  01000010
$0xxF=$87  10000111

[edit] OAM

OAM (Object Attribute Memory) contains a display list of up to 64 sprites, where each sprite's information occupies 4 bytes.

[edit] Byte 0

Y position of top of sprite

Sprite data is delayed by one scanline; you must subtract 1 from the sprite's Y coordinate before writing it here. Hide a sprite by writing any values in $EF-$FF here.

[edit] Byte 1

Tile index number

For 8x8 sprites, the tile number of this sprite. For 8x16 sprites:

76543210
||||||||
|||||||+- Bank ($0000 or $1000) of tiles
+++++++-- Tile number of top of sprite (0 to 254; bottom half gets the next tile)

[edit] Byte 2

Attributes

76543210
||||||||
||||||++- Palette (4 to 7) of sprite
|||+++--- Unimplemented, reads back as 0
||+------ Priority (0: in front of background; 1: behind background)
|+------- Flip sprite horizontally
+-------- Flip sprite vertically

[edit] Byte 3

X position of left side of sprite

X-scroll values of F9-FF do NOT result in the sprite wrapping around to the left side of the screen.

Most programs write to a copy of OAM somewhere in CPU addressable RAM (often $0200-$02FF) and then copy it to OAM each frame using the OAM_DMA ($4014) register.

[edit] Sprite evaluation

During all visible scanlines, the PPU scans through OAM to determine which sprites to render on the next scanline. During each pixel clock (341 total per scanline), the PPU accesses OAM in the following pattern:

  1. Cycles 0-63: Secondary OAM (32-byte buffer for current sprites on scanline) is initialized to $FF - attempting to read $2004 will return $FF
  2. Cycles 64-255: Sprite evaluation
    • On even cycles, data is read from (primary) OAM
    • On odd cycles, data is written to secondary OAM (unless writes are inhibited, in which case it will read the value in secondary OAM instead)
    • 1. Starting at n = 0, read a sprite's Y-coordinate (OAM[n][0], copying it to the next open slot in secondary OAM (unless 8 sprites have been found, in which case the write is ignored).
    • 1a. If Y-coordinate is in range, copy remaining bytes of sprite data (OAM[n][1] thru OAM[n][3]) into secondary OAM.
    • 2. Increment n
    • 2a. If n has overflowed back to zero (all 64 sprites evaluated), go to 4
    • 2b. If less than 8 sprites have been found, go to 1
    • 2c. If exactly 8 sprites have been found, disable writes to secondary OAM
    • 3. Starting at m = 0, evaluate OAM[n][m] as a Y-coordinate.
    • 3a. If the value is in range, set the sprite overflow flag in $2002 and read the next 3 entries of OAM (incrementing 'm' after each byte and incrementing 'n' when 'm' overflows); if m = 3, increment n
    • 3b. If the value is not in range, increment n AND m (without carry). If n overflows to 0, go to 4; otherwise go to 3
    • 4. Attempt (and fail) to copy OAM[n][0] into the next free slot in secondary OAM, and increment n (repeat until HBLANK is reached)
  3. Cycles 256-319: Sprite fetches (8 sprites total, 8 cycles per sprite)
    • 1-4: Read the Y-coordinate, tile number, attributes, and X-coordinate of the selected sprite
    • 5-8: Read the X-coordinate of the selected sprite 4 times.
    • On the first empty sprite slot, read the Y-coordinate of sprite #63 followed by $FF for the remaining 7 cycles
    • On all subsequent empty sprite slots, read $FF for all 8 reads
  4. Cycles 320-340: Background render pipeline initialization
    • Read the first byte in secondary OAM (the Y-coordinate of the first sprite found, sprite #63 if no sprites were found)

This pattern was determined by doing carefully timed reads from $2004 using various sets of sprites. In the case where there are 8 sprites on a scanline, the sprite evaluation logic effectively breaks and starts evaluating the tile number/attributes/X-coordinates of other sprites as Y-coordinates, resulting in rather inconsistent sprite overflow behavior (showing both false positives and false negatives).

The sprite priority system has a quirk when the background, a front-priority sprite, and a back-priority sprite are in the same area. Games such as Super Mario Bros. 3 take advantage of this.

[edit] Nametables

     (0,0)     (256,0)     (511,0)
       +-----------+-----------+
       |           |           |
       |           |           |
       |   $2000   |   $2400   |
       |           |           |
       |           |           |
(0,240)+-----------+-----------+(511,240)
       |           |           |
       |           |           |
       |   $2800   |   $2C00   |
       |           |           |
       |           |           |
       +-----------+-----------+
     (0,479)   (256,479)   (511,479)

The NES has four nametables, arranged in a 2x2 pattern. Each occupies a 1 KiB chunk of PPU address space, starting at $2000 at the top left, $2400 at the top right, $2800 at the bottom left, and $2C00 at the bottom right. Each byte in the nametable controls one 8x8 pixel character cell, and each nametable has 30 rows of 32 tiles each, for 960 ($3C0) bytes; the rest is used by each nametable's attribute table. With each tile being 8x8 pixels, this makes a total of 256x240 pixels in one map, the same size as one full screen.

But the NES system board itself has only 2 KiB of VRAM (called CIRAM), enough for two nametables; hardware on the cartridge controls address bit 10 of CIRAM to map one nametable on top of another.

  • Vertical mirroring: $2000 equals $2800 and $2400 equals $2C00 (e.g. Super Mario Bros.)
  • Horizontal mirroring: $2000 equals $2400 and $2800 equals $2C00 (e.g. Kid Icarus)
  • One-screen mirroring: All nametables refer to the same memory at any given time, and the mapper directly manipulates CIRAM address bit 10 (e.g. many Rare games using AxROM)
  • Four-screen mirroring: CIRAM is disabled, and the cartridge contains additional VRAM used for all nametables (e.g. Gauntlet, Rad Racer 2)

[edit] Attribute tables

+---+---+---+---+
|   |   |   |   |
+ D1-D0 + D3-D2 +
|   |   |   |   |
+---+---+---+---+
|   |   |   |   |
+ D5-D4 + D7-D6 +
|   |   |   |   |
+---+---+---+---+

This is admittedly one of the hardest things for a beginner to grasp about the PPU. The last 64 bytes of each nametable (at $23C0, $27C0, $2BC0, and $2FC0) is an "attribute table" related to the preceding nametable. Each byte in the attribute table controls the palette of a 4x4 cell (32x32 pixel) square, and two bits of each byte control the palette of a 2x2 cell (16x16 pixel) area. This is why most NES games used 16x16 metatiles (size of Super Mario Bros. ? block) or 32x32 metatiles (width of SMB pipe).

Nametable tiles are 8x8 pixels, and the $2001 mask is 8 pixels wide, but attribute table tiles are 16x16 pixels. This is why games that use the horizontal or vertical mirroring mode for diagonal scrolling often have color artifacts on one side of the screen (on the right side in Super Mario Bros. 3; on the trailing side of the scroll in Kirby's Adventure; at the top and bottom in Super C).

[edit] Palettes

The palette for the background runs from VRAM $3F00 to $3F0F; the palette for the sprites runs from $3F10 to $3F1F. Each color takes up one byte.

$3F00 Universal background color
$3F01-$3F03 Background palette 0
$3F05-$3F07 Background palette 1
$3F09-$3F0B Background palette 2
$3F0D-$3F0F Background palette 3
$3F11-$3F13 Sprite palette 0
$3F15-$3F17 Sprite palette 1
$3F19-$3F1B Sprite palette 2
$3F1D-$3F1F Sprite palette 3

Addresses $3F04/$3F08/$3F0C can contain unique data, though these values are not used by the PPU when rendering. Addresses $3F10/$3F14/$3F18/$3F1C are mirrors of $3F00/$3F04/$3F08/$3F0C.

Another way of looking at it:

43210
|||||
|||++- Pixel value from tile data
|++--- Palette number from attribute table or OAM
+----- Background/Sprite select

Like most early color game consoles, the NES palette is based on Hue/Saturation/Value

76543210
||||||||
||||++++- Hue
||++----- Value
++------- Unimplemented, reads back as 0

Hue $0 is light gray, $1-$C are blue to red to green to cyan, $D is dark gray, and $E-$F are black. The canonical code for "black" is $0F. It works this way because of the way colors are represented in an NTSC or PAL signal, with the phase of a color subcarrier controlling the hue. For details, see NTSC Video.

Note that most VS Unisystem arcade PPUs have completely different palettes, and Playchoice-10 PPUs render hue $D as black.

[edit] NTSC/PAL differences

Consoles in the USA and Europe run at different speeds due to the different television standards used. The main differences between the NTSC and PAL PPUs is as follows:

Property NTSC PAL
Master clock speed 21.477272MHz
= 236.25 MHz / 11
26.601712MHz
PPU clock speed 21.477272MHz / 4 26.601712MHz / 5
Length of VBLANK 20 scanlines 70 scanlines
Timing quirks First scanline is one PPU cycle shorter during every other frame None

Other than these differences (and video output), the NTSC and PAL PPUs function exactly the same.

[edit] Important timing issues

The NTSC video signal is made up of 262 scanlines, and 20 of those are spent in vblank state. After the program has received an NMI, it has about 2270 cycles to update the palette, sprites, and nametables as necessary before rendering begins.

PPU Frame Timing has some new timing information.

[edit] Special tricks

More on raster effects: Each scanline takes 341/3 = 113+2/3 CPU cycles (on NTSC) or 341/3.2 = 106+9/16 CPU cycles (on PAL). Setting the current display position (through some twiddling of $2005 and $2006) at the start of each scanline can create pseudo-parallax scrolling and image warping; this is how Rad Racer and all the other Pole Position clones work.

As CNROM games began to run up against the 32KB program ROM size limit, some games such as Milon's Secret Castle stored map data in CHR ROM (ROM connected to the PPU address lines) and read it through $2007. Super Mario Bros. also stored its title screen layout data in CHR ROM.

The palette contains 48 colors and 6 grays (4 grays on the RGB PPU of the PlayChoice 10, which lacks the $xD levels); a program can normally display 25 of these on the same screen. However, mid-frame palette swapping combined with the color emphasis bits in $2001 can create over 400 simultaneous colors, something that few commercial games used but accurate emulators must allow for. Still, an emulator running on a low-spec platform whose video has a 256-color palette can usually get away with limiting each frame to four distinct color emphasis settings.

The $0D black color should not be used. It produces a voltage level lower than the black level (aka "blacker than black"). Some TVs will see this as the sync level and not show the picture correctly.

[edit] Other pages about the NES PPU

Personal tools