What makes accurate emulation of old systems a difficult task?What are the laws concerning emulation in the...

Does a large simulator bay have standard public address announcements?

Minor Revision with suggestion of an alternative proof by reviewer

Why did C use the -> operator instead of reusing the . operator?

Rivers without rain

Is the claim "Employers won't employ people with no 'social media presence'" realistic?

As an international instructor, should I openly talk about my accent?

How to pronounce 'c++' in Spanish

Extension of 2-adic valuation to the real numbers

Get consecutive integer number ranges from list of int

How to write a column outside the braces in a matrix?

How to fry ground beef so it is well-browned

Implications of cigar-shaped bodies having rings?

Classification of surfaces

Is Diceware more secure than a long passphrase?

Was there a shared-world project before "Thieves World"?

Map of water taps to fill bottles

Don’t seats that recline flat defeat the purpose of having seatbelts?

Why does nature favour the Laplacian?

How much cash can I safely carry into the USA and avoid civil forfeiture?

Can someone publish a story that happened to you?

Is there a way to generate a list of distinct numbers such that no two subsets ever have an equal sum?

Dynamic SOQL query relationship with field visibility for Users

How to stop co-workers from teasing me because I know Russian?

Why does Mind Blank stop the Feeblemind spell?

What makes accurate emulation of old systems a difficult task?

What are the laws concerning emulation in the US?NES cartridge ROM emulation with Arduino or Pi?How old-computer emulators work?What are all the known file extensions for Atari 8-bit disk/tape/cartridge images?Is emulation ultimately the future of retro computing?How to do I/O with emulation code?Are there runnable Multics systems available?Retro emulation with perfect V-syncWhat systems had the lowest resolution ever that still allowed games to be made?Operating systems which have non-x86 instruction set architecture

Thanks to a lot of passionate and skilled people, we can emulate pretty much any retro platform today.

For most people they perform well enough, but in fact, a substantial amount of these emulators are hardly accurate when compared to the original hardware they are trying to emulate.

While some of them accurately emulate the target platform, it generally comes with a huge performance penalty, making this a never ending quest to perfect emulation.

My question is the following:

What makes accurate emulation of some old systems a difficult thing, if not impossible?

Obviously this is a fairly broad topic with potentially infinite answers, so let me know if and how the question could be improved to be a good fit for the website.

edited yesterday

asked yesterday

Aybe

1,0721821

2

I hope someone can speak to this in detail in an actual answer, but wanted to point out that one challenge is "extensibility" of older systems. By this I simply mean that the main console may be accurately emulated but you are unable to run a certain game (accurately, or at all) due to custom hardware on a cartridge (extended ROM, custom sound chips, etc)

– tolos
yesterday

6

I remember reading that DosBox originally wrote their emulator architecture perfectly from technical documents and because of that, it didn't work. Clever hacks and tricks that were added to or used by the chips in questions didn't exist in the proper documentation. There is also the issue that chips commonly have errors in their designs or manufacturing that changes the behavior of the machine overall. Then there is purposeful misuse of a feature, such as caching textures in sound memory or using the video pipeline to parallel process AI data. Read the Dolphin reports and just say, "wow".

– Kayot
yesterday

2

@Kayot DOSBox never attempted to be a perfect emulator at any level. Even today it doesn't implement certain documented features simply because no game ever used them.

– Ross Ridge
yesterday

1

@Zibbobz by age five or so, making the sound of the disk reader shifting back and forth is pretty much the only thing my PS1 was still able to do.

– Tommy
yesterday

5

Define "accurate".

– Thorbjørn Ravn Andersen
yesterday

|
show 7 more comments

Thanks to a lot of passionate and skilled people, we can emulate pretty much any retro platform today.

For most people they perform well enough, but in fact, a substantial amount of these emulators are hardly accurate when compared to the original hardware they are trying to emulate.

While some of them accurately emulate the target platform, it generally comes with a huge performance penalty, making this a never ending quest to perfect emulation.

My question is the following:

What makes accurate emulation of some old systems a difficult thing, if not impossible?

Obviously this is a fairly broad topic with potentially infinite answers, so let me know if and how the question could be improved to be a good fit for the website.

edited yesterday

asked yesterday

Aybe

1,0721821

2

I hope someone can speak to this in detail in an actual answer, but wanted to point out that one challenge is "extensibility" of older systems. By this I simply mean that the main console may be accurately emulated but you are unable to run a certain game (accurately, or at all) due to custom hardware on a cartridge (extended ROM, custom sound chips, etc)

– tolos
yesterday

6

I remember reading that DosBox originally wrote their emulator architecture perfectly from technical documents and because of that, it didn't work. Clever hacks and tricks that were added to or used by the chips in questions didn't exist in the proper documentation. There is also the issue that chips commonly have errors in their designs or manufacturing that changes the behavior of the machine overall. Then there is purposeful misuse of a feature, such as caching textures in sound memory or using the video pipeline to parallel process AI data. Read the Dolphin reports and just say, "wow".

– Kayot
yesterday

2

@Kayot DOSBox never attempted to be a perfect emulator at any level. Even today it doesn't implement certain documented features simply because no game ever used them.

– Ross Ridge
yesterday

1

@Zibbobz by age five or so, making the sound of the disk reader shifting back and forth is pretty much the only thing my PS1 was still able to do.

– Tommy
yesterday

5

Define "accurate".

– Thorbjørn Ravn Andersen
yesterday

|
show 7 more comments

Thanks to a lot of passionate and skilled people, we can emulate pretty much any retro platform today.

For most people they perform well enough, but in fact, a substantial amount of these emulators are hardly accurate when compared to the original hardware they are trying to emulate.

While some of them accurately emulate the target platform, it generally comes with a huge performance penalty, making this a never ending quest to perfect emulation.

My question is the following:

What makes accurate emulation of some old systems a difficult thing, if not impossible?

Obviously this is a fairly broad topic with potentially infinite answers, so let me know if and how the question could be improved to be a good fit for the website.

edited yesterday

asked yesterday

Aybe

1,0721821

Thanks to a lot of passionate and skilled people, we can emulate pretty much any retro platform today.

For most people they perform well enough, but in fact, a substantial amount of these emulators are hardly accurate when compared to the original hardware they are trying to emulate.

While some of them accurately emulate the target platform, it generally comes with a huge performance penalty, making this a never ending quest to perfect emulation.

My question is the following:

What makes accurate emulation of some old systems a difficult thing, if not impossible?

Obviously this is a fairly broad topic with potentially infinite answers, so let me know if and how the question could be improved to be a good fit for the website.

emulation

edited yesterday

asked yesterday

Aybe

1,0721821

edited yesterday

asked yesterday

Aybe

1,0721821

edited yesterday

asked yesterday

Aybe

1,0721821

asked yesterday

Aybe

1,0721821

asked yesterday

Aybe

1,0721821

2

I hope someone can speak to this in detail in an actual answer, but wanted to point out that one challenge is "extensibility" of older systems. By this I simply mean that the main console may be accurately emulated but you are unable to run a certain game (accurately, or at all) due to custom hardware on a cartridge (extended ROM, custom sound chips, etc)

– tolos
yesterday

6

I remember reading that DosBox originally wrote their emulator architecture perfectly from technical documents and because of that, it didn't work. Clever hacks and tricks that were added to or used by the chips in questions didn't exist in the proper documentation. There is also the issue that chips commonly have errors in their designs or manufacturing that changes the behavior of the machine overall. Then there is purposeful misuse of a feature, such as caching textures in sound memory or using the video pipeline to parallel process AI data. Read the Dolphin reports and just say, "wow".

– Kayot
yesterday

2

@Kayot DOSBox never attempted to be a perfect emulator at any level. Even today it doesn't implement certain documented features simply because no game ever used them.

– Ross Ridge
yesterday

1

@Zibbobz by age five or so, making the sound of the disk reader shifting back and forth is pretty much the only thing my PS1 was still able to do.

– Tommy
yesterday

5

Define "accurate".

– Thorbjørn Ravn Andersen
yesterday

|
show 7 more comments

2

I hope someone can speak to this in detail in an actual answer, but wanted to point out that one challenge is "extensibility" of older systems. By this I simply mean that the main console may be accurately emulated but you are unable to run a certain game (accurately, or at all) due to custom hardware on a cartridge (extended ROM, custom sound chips, etc)

– tolos
yesterday

6

I remember reading that DosBox originally wrote their emulator architecture perfectly from technical documents and because of that, it didn't work. Clever hacks and tricks that were added to or used by the chips in questions didn't exist in the proper documentation. There is also the issue that chips commonly have errors in their designs or manufacturing that changes the behavior of the machine overall. Then there is purposeful misuse of a feature, such as caching textures in sound memory or using the video pipeline to parallel process AI data. Read the Dolphin reports and just say, "wow".

– Kayot
yesterday

2

@Kayot DOSBox never attempted to be a perfect emulator at any level. Even today it doesn't implement certain documented features simply because no game ever used them.

– Ross Ridge
yesterday

1

@Zibbobz by age five or so, making the sound of the disk reader shifting back and forth is pretty much the only thing my PS1 was still able to do.

– Tommy
yesterday

5

Define "accurate".

– Thorbjørn Ravn Andersen
yesterday

I hope someone can speak to this in detail in an actual answer, but wanted to point out that one challenge is "extensibility" of older systems. By this I simply mean that the main console may be accurately emulated but you are unable to run a certain game (accurately, or at all) due to custom hardware on a cartridge (extended ROM, custom sound chips, etc)

– tolos
yesterday

I remember reading that DosBox originally wrote their emulator architecture perfectly from technical documents and because of that, it didn't work. Clever hacks and tricks that were added to or used by the chips in questions didn't exist in the proper documentation. There is also the issue that chips commonly have errors in their designs or manufacturing that changes the behavior of the machine overall. Then there is purposeful misuse of a feature, such as caching textures in sound memory or using the video pipeline to parallel process AI data. Read the Dolphin reports and just say, "wow".

– Kayot
yesterday

@Kayot DOSBox never attempted to be a perfect emulator at any level. Even today it doesn't implement certain documented features simply because no game ever used them.

– Ross Ridge
yesterday

@Zibbobz by age five or so, making the sound of the disk reader shifting back and forth is pretty much the only thing my PS1 was still able to do.

– Tommy
yesterday

Define "accurate".

– Thorbjørn Ravn Andersen
yesterday

|
show 7 more comments

9 Answers
9

active

oldest

votes

Speaking from my personal experience of writing a PET emulator, a C64 emulator and a Sinclair Spectrum emulator,, here are the issues I had:

Getting the Speed Right

It's no good just making a processor go as fast as it can because, frequently, application code depends on timing. For old 8-bit machines, it's easy to write an emulator that runs at many times the speed of the original. The trouble is that has a knock on effect. For example, the PET Space Invaders program goes way too fast to be playable. Not that it matters because its key scanning code is similarly speeded up which means when you press a key, your gun is uncontrollable.

The same issue applies to the C64. The interrupt is driven by one of the IO chips which needs to be synchronised to the CPU's clock. This means pressing a key for even a brief time is the same as pressing it for several seconds on a real C64.

So you need to throttle the performance to something like the original speed. Unfortunately, that means having a clock with microsecond accuracy. Such things do not really exist in modern general purpose PC operating systems. Your CPU thread just has to be scheduled out of the processor and it will miss several microseconds probably. One way to get around this is to raise some event every 1/60th of a second (the probable refresh rate of your monitor and handily NTSC TVs) and when the CPU has executed 1/60th of a second's worth of instructions, just make it wait until the event occurs. Unfortunately, that makes doing sound on a PET or a Spectrum difficult because they both rely on the CPU toggling bits in IO registers at the right frequency.

Parallelism

In a real computer, there are several components that all operate concurrently. For example, the C64 has a VIC II chip for the display, a sound chip, some IO chips ands a 6510 and they are all synchronised by the same clock. The easiest way to deal with this is to have a loop in which you execute an instruction in the CPU and then update all the other components with the new clock time. Unfortunately, this is by nature serial and you have to be careful about do complex stuff in case it makes your emulation too slow.

An alternative is to put each component in its own thread, taking advantage of the multiple cores of modern computers, but then you have the problem of synchronisation. All of your components will need to access your emulation of the memory bus and they all need an access to the same copy. So, you might emulate the clock with a boolean that is toggled every 0.5 microseconds (in emulation time, see above) by the CPU. Unfortunately, modern processor cores have caches between themselves and the main memory. If the thread emulating the CPU core toggles a boolean representing the clock, it may only actually be altering the cached version of that variable and the other components won't see it. There are OS functions that allow you to force the cached version of a variable to main memory, but they incur a significant performance penalty. It's about 100 times slower to access main memory than L1 cache.

Documentation

Documentation for old computers can be quite hard to find and may not be detailed enough for constructing an accurate emulator. For example, if you want an accurate Z80 emulation, you need to understand that there is an undocumented "w" register which affects the behaviour of some of the undocumented Z80 instructions. In theory you don't need to care about those, but in practice, some popular game might have used them. The behaviour of the W register has been painstakingly reverse engineered by enthusiasts, but sometimes they get it wrong.

The other problem with old documentation is that it frequently contains errors. A popular book on the 6502 was the Zaks book, Programming the 6502. I remember that my Dad's copy of it was festooned with hand written annotations correcting all the errors that he discovered by bitter experience.

Graphics

Getting the graphics right is pretty hard. I started by just taking a dump of the graphics memory every 1/60th of a second and drawing it in a window. I progressed to doing that in the GPU but it is still not right. C64 programmers were adept at changing the graphics mode on the fly so they could use mixed modes on the screen. Even the Spectrum effect of the rapidly moving stripes in the border when the tape is loading is done by rapid changes to the background colour. You can't just snapshot the state and render it every 60th of a second, you effectively have to know the state at the end of every scan line on the VDU and, in fact, on the C64 I believe it was possible to split the screen vertically by carefully timed mode changes during a scan line. I haven't solved that yet.

Sound

Timing is actually more important as far as sound is concerned than graphics. A film is projected at 24 frames per second and our brains easily fill in the gaps. Try something similar with sound and you'll soon notice. For this reason, a haven't even attempted to emulate sound on the PET or the Spectrum. The C64 should be easier because it had a sound chip that you sent commands to rather than having to toggle an output wire very fast, but I haven't done that bit yet.

Development Tools

You'll need to create test programs for your emulation which means having a development suite. I was lucky in that Z80 and 6502 are both relatively well supported. Other architectures are not so fortunate. Not finding a good toolchain for 68000 stopped me from bothering with that architecture.

Responses to Comments

I woke up this morning and mistook the number of comments I had as my accumulated overnight score. (I nearly fell off my chair when I saw what my actual score was, thank you.) There are many points to answer which I don't feel I can do in the comments easily, so I will put my answers here.

I feel [your answer] understates how big of an issue parallelism is ~ Austin Hemmelgarn

Yes it does. But I have so far only attempted to write emulators from systems from the 80's which tend to be slow by today's standards and also very simple. Thus, you can get away with serialising a lot of the parallel components and, in fact, that might be desirable given that running too fast is often too fast.

Ironically, I have found that components that run off different clocks are sometimes easier to work with because the protocols between them cannot rely on clock synchronisation, so there is usually some hand shaking mechanism that means they don't rely on absolute timings.

As a simple example that expands on Luaan's reply, the PET interrupt is driven by the vertical sync of the monitor (or it appears to be), so I simply had a task triggered by the actual monitor refresh which is 60 per second that raises an interrupt. This meant that the keyboard scanned at the right speed and the cursor blinked at the right speed even though the CPU was running at about 80MHz. By contrast, on the C64, the same interrupt is driven by a timer in one of the IO chips which itself is driven from the CPU clock, which means it looks like the C64 runs ridiculously fast.

This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision. ~~ Chris Rollins

On a modern general purpose PC, you have no choice about when your threads get suspended. My laptop has 8 cores and 16 hyperthreads, but there are way more threads than that running on it at any one time. Most are suspended, at any one time, but it's quite possible for the emulated CPU thread to be pre-emptively suspended at periods of high load without you doing anything. Furthermore, in order to time things at the microsecond level, you need a microsecond accurate clock. Note: precision is not accuracy. My laptop has a nanosecond precision clock, but it would be a stretch to assume you can measure time periods accurately to the nanosecond with it. For one thing, it requires a system call and system calls are relatively expensive and also lay you open to rescheduling your thread. Your milage mat vary depending on your operating system.

Don't modern sound cards buffer sound data so you don't have to bit-bang them? ~ snips n snails

I had thought of counting the number of toggles in a certain period and counting the CPU clock cycles to get a frequency to send to the sound card, but I don't know enough about programming modern sound cards to say if that is reasonable or would give good fidelity. I guess you actually don't need it for a Spectrum or a PET :) . Having just read the next comment by Ilmari Karonen, this is the same approach. I think you'd need a shorter period than a frame (I assume you mean a video frame).

Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, ~ Peter Cordes

Yes I admit I was running fast and loose with the actual mechanics of cache and memory synchronisation in the interests of brevity. The point is that ensuring correct sequencing of memory updates requires special expensive synchronisations like memory barriers. This is over and above the penalty of going to main memory relative to the cache.

edited 17 hours ago

answered yesterday

JeremyP

5,71012132

8

This is a good answer, but I feel it understates how big of an issue parallelism is. A lot of systems (especially game consoles) not only have multiple components running in parallel, but often on multiple differing clocks, with completely different code translation required. For example, a Sega Saturn has 9 separate 'core' chips to emulate (2 SH-2 chips, one m68k, one SH-1, two custom video chips, a custom synth/DSP for audio, and a custom MCU mediating all the busses), covering 5 differing instruction sets running at four different clock speeds.

– Austin Hemmelgarn
yesterday

4

You're wrong about something important here. "having a clock with microsecond accuracy. Such things do not really exist in modern general purpose PC operating systems. Your CPU thread just has to be scheduled out of the processor and it will miss several microseconds probably." <- This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision.

– Chris Rollins
yesterday

1

Could it be practical to handle sound (and possibly other bit-banging effects) by recording changes to I/O registers (with cycle counts for relative timing) while running the emulated code for a frame, and then "play them back" to reconstruct the 1/60 second audio waveform (and possibly any graphical effects) that those changes would have produced? This would cause the audio to be delayed slightly, but at worst it would be by one frame, which shouldn't be too noticeable (especially if the graphics are delayed by the same time as well).

– Ilmari Karonen
yesterday

4

Speaking of Space Invaders, on the original the reason the enemies got faster when there were less was because drawing them all on the screen took less time (not because the game was designed to do so, but the designer intentionally didn't take it out). So not only do you want the game to run at the correct speed you would want to ensure it would still properly speed up towards the end.

– Captain Man
yesterday

2

Your description of cache vs. multithreading is based on a common misconception that other cores can still load "stale" values from their cache after another core modifies its cached copy of a line. Actually CPUs use (a variant of) MESI to ensure cache coherency. See Is mov + mfence safe on NUMA?. The actual problem is memory reordering introduced by the store buffer, before stores have committed to L1d cache. Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, so you only get acq/rel not seq_cst

– Peter Cordes
yesterday

|
show 10 more comments

The best description of this problem that I've seen was written by Byuu, one of the developers of the bsnes emulator. That's about as authoritative as they come, but I'll try to summarize it here.

Modern software development is a completely different world than retro consoles. There were no reusable game engines or manufacturer-provided APIs. Everything was written from complete scratch and ran on the bare metal. The code was deeply tied into the raw hardware, including aspects of the hardware that were unintentional. The article has an example of a game that relied on the SNES processor's unusual behavior on a rare edge case. If that edge case is not perfectly emulated or if the timing is even slightly off, then the game deadlocks. Consoles frequently had multiple chips inside (CPU, audio processor, graphics, etc) that all ran at different speeds. Emulators have to emulate all of these precisely, down to the exact delays required for one part of the hardware to send a result back to another. Not doing it exactly right can result in audio and video getting out of sync, graphics being drawn during the wrong frame, etc. The only way to accurately emulate timing at that level of detail is to run significantly faster than the original hardware.

Cartridge-based games frequently made this problem worse. The cartridge or disk for a modern game is merely some sort of memory containing software. Retro game cartridges frequently contained additional hardware as well, anything from extra graphics accelerators to memory mappers. These have to be emulated in order for the software to work, and many of these have no publicly-available documentation.

Some games also took advantage of the way that old CRT screens drew images. They would do things like modify the registers that controlled graphics output while in the middle of painting a scanline on the screen, in order to create specific visual effects. These visual effects either don't work at all on modern LCD screens or produce very different results unless the emulation engine can detect what the code is trying to do and compensate for it.

I encourage you to read the article because I'm giving a bad summary but the TL;DR version is this: most emulators focus on the "ideal" hardware behavior, and most games will run "good enough" to be basically playable. You'll also suffer from hundreds of obscure bugs. The only way to make all games playable the exact same way that they played on real hardware is to emulate the inner workings of every single undocumented, proprietary chip in the console, quirks and bugs included. And that's really, really hard.

answered yesterday

bta

3113

New contributor

4

+1 for the best answer. I had this exact article in mind when I saw the question.

– Mason Wheeler
yesterday

"Everything was written from complete scratch and ran on the bare metal. The code was deeply tied into the raw hardware, including aspects of the hardware that were unintentional." - exactly this.

– Stilez
yesterday

1

Standard note of caution: the linked article is an individual using a public platform to try to explain to an audience what he thinks are his personal achievements and the motivations for thrm. So although it's better than, say, a politician's Twitter feed, because there was an editor involved, it's still not especially objective.

– Tommy
yesterday

'proprietary' strikes again, +1

– Mazura
23 hours ago

add a comment |

Almost all computer systems have multiple different devices operating in parallel. For example, the CPU is typically running parallel to the video generation hardware.

This can be very difficult to emulate accurately, as interactions between different devices require that the emulation of each device proceeds in sync with the rest. Each device may be running on a different clock, and it becomes necessary to accurately emulate things like bus contention and DMA slot allocation.

There is also the issue of simply knowing how the original system actually behaves. Even now emulator developers are still finding new behaviours of classic hardware such as the Amiga and C64. Documentation is usually incomplete and back when those machines were popular developers would experiment with using the hardware in unintended ways, with results that were undocumented and sometimes not even consistent across revisions of the same machine.

Analogue processes are also very difficult to emulate accurately. For example, the way the Amiga generates sound uses pulse-width modulation and multiple analogue filters. Such systems can be measured and modelled but producing an accurate reproduction digitally is tricky. It gets even worse when you consider systems that had build in speakers or displays, but to sound/look right the properties of those must also be considered.

There are also issues with modern systems that make accurate emulation difficult. Input lag is a major one. Older systems tended to read the keyboard and joystick inputs as simple switches, resulting in very minimal lag. Modern PCs use USB, often with a polling rate of 8ms (the fastest possible for USB1.1). There is also often lag between the PC and the display output, where as older systems had near zero lag between the hardware generating a video signal and it appearing on the monitor.

Monitor refresh rates are also a problem. PC monitors rarely go below 60Hz and graphics cards rarely offer arbitrary refresh rates. Most older machines were either 50Hz (PAL/SECAM) or 59.94Hz (NTSC) so there is a mismatch between one emulated frame and one frame displayed on the host PC. Arcade systems often had odd frame rates such as 57.5Hz too, which even analogue TVs and monitors tend to struggle with.

answered yesterday

user

4,746922

"It gets even worse when you consider systems that had build in speakers or displays, but to sound/look right the properties of those must also be considered." - one example is the original non-backlit GameBoy Advance, which caused colours to be blurry and less bright, so some GBA emulators have a setting to make the colours less colourful and blurrier. Even Nintendo's official hardware had this problem with the successors that had backlit screens, and they didn't bother to correct it.

– immibis
yesterday

2

@immibis, a better example would be NTSC artifact colors: rapid brightness changes in the framebuffer turn into color changes on the screen.

– Mark
yesterday

add a comment |

I'm the author of an emulator of a whole bunch of platforms, so I'm going to go ahead and commit a heresy: accurate emulation of old systems isn't a difficult task. It's just an expensive one.

It's expensive in terms of processing. Which means that a lot of the more historical emulators make conscious approximations in order to hit their performance budgets. These usually involve making things falsely atomic — e.g. implementing a processor as something that (i) performs an instruction instantaneously; and then (ii) time warps to where it should have been had it spent time doing the processing. This is the sort of thing that has visible side effects as soon as there are any other chips in the system that work as a function of time.

Trivial example: a 6502 read-modify-write cycle is likely to: (i) read the value to mutate; (ii) write it back again, without modification (while the processor is working on the real result); then (iii) write it back modified.

Supposing you were doing something like mutating a video display processor's status register then the fact that there's a rewrite of the original value, and then the meaningful write doesn't occur until two cycles after the read might be important. So if you're not genuinely serialising bus interactions then you're likely to be in trouble.

The term is primarily marketing puff but this is the sort of thing that scenesters refer to as 'cycle accuracy'.

You can optimise a large part of it away but usually only by introducing substantial coupling within your code — e.g. by having the processor know what's in different places in the memory map, exactly like real processors don't, and possibly having it communicate via a bunch of winks and nods. So then the cost is on maintainability and future code scalability.

It's expensive in terms of development. Not just initial, but subsequent. This partly cuts to the serialisation point that others have made: how do you make it look as though multiple components are executing at the same time?

In the most trivial case, you just implement them as coroutines and round robin between them. In the olden days you might use a very loose precision for that scheduler. Nowadays you might use whatever is the machine's necessary minimum unit of time. This is how you end up at a really slow emulator: quite a lot of time is spent on the scheduling itself, since it happens frequently, and you're trashing your machine's caches by jumping around between sections of code and their working data sets.

There are threading solutions but it's a judgment call which is appropriate in which situation, and they're not necessarily pleasant on your host CPU as spin locks tend to be the order of the day. Alternatively you can schedule optimistically, assuming component isolation, and just repeat work if you were wrong, but then you've got to get into the weeds of full state preservation for potential restoration.

I tend towards sequence points, i.e. identifying when it's predictable that two components can without any inaccuracy accruing run in isolation for a period, and then letting them do so, but then that prediction mechanism is a whole extra chunk of code that you need to spend a decent amount of time working on.

Historically it was expensive in terms of research. This cuts to the topics raised by others of figuring out exactly what components are really doing before you can implement them, which tends to differ from the documented behaviour. But for most things that the modern processing budget permits to be low-level emulated you're now talking twenty or thirty years of enthusiasts poking at them and documenting their findings. So a lot of this was difficult work, but it's difficult work that the rest of us can just parasitically consume.

answered yesterday

Tommy

16.7k14883

add a comment |

In nearly every computer, you have several things going on in parallel, even if it is just code execution on a CPU and screen refresh by the graphics card. In most cases, emulating behaviour of things that happen serially is quite easy, but as soon as you have to synchronize parallel actions, emulation gets difficult.

I once hacked on a gameboy emulator, and one of the performance limiting factors of the video emulation was that games do change video controller parameters during scan-out, e.g. to change the background color between the score bar and the game screen, or change the start offset of the screen scanout to have a fixed score bar above or below a scrolling game screen. This means that in the general case, you have to draw each scanline separately and take in account the current video parameters to scan it correctly, even though the graphics chip works with 8x8 tiles and (if the software would not change parameters) you could generate 8 lines at once with less overhead.

In many cases, changing video parameters is actively synchronized in CPU code, either by using a scan-line match interrupt (but possibly not on the game boy, I forgot whether it has a scan-line interrupt) or by polling the "current scanline register". In these cases, the emulator can provide synchronization by fudging values in the current scanline register to "probe" what value the software is waiting for, or by knowing the first scanline to apply the parameters from the interrupt configuration, but in some cases, programmers just counted CPU cycles, so the emulator needs to know how much scan lines elapsed between the latest synchronization and the current point in time.

answered yesterday

Michael Karcher

1834

New contributor

add a comment |

There has been a drift in the goals of video game emulators over the past decade or two. At first, getting emulated software to run at all was a major success, and was often done with simplified kludges and hacks. As the resources available to emulators has improved, the focus has moved more toward accurate reconstruction of what the original does. But why is that difficult to do?

Well, if software was written to follow a standard API and use documented system calls at all times, things would be fairly straightforward: just design a modern implementation of that API, and everything should work fine. But that's clearly not the case.

Sometimes these standardised interfaces don't meet the needs of programmers. Perhaps the functionality they want isn't provided (e.g. the PC BIOS's primitive graphics handling routines). Or maybe the "official" method is too slow for you, and you need to get the very fastest performance possible. This is often the case with computer games, which are of course the type of software that is most popular, emulation-wise.

At this point, the programmer would typically bypass the API, and address the hardware directly itself. And this is where the problems start.* As by trial and error programmers will find, use, and rely on hardware behaviour that may not have been intended or known of by the system's designers.

To accurately emulate a piece of hardware (for example a graphics processor or I/O chip) you need to know what it does, and how it does it. If you're using an off-the-shelf component, such as a 6522 VIA (I/O chip), then there will be some public documentation available. But if the programmer has found some undocumented functionality (or the system designer has written the operating system according to how the physical hardware behaves, and not documented it exhaustively themselves) then there's no easy way for emulator writers to know how the game+system actually works.

One approach is to use a black-box method. You give some input to a physical system, and see what output you get. But that's limited to your ability to think of all possible relevant situations, and going off to test them all.

Another is to open the chip up (a process known as decapping) and photograph the transistors on the die, layer by layer. If you use these images to reconstruct the "circuit diagram" of the chip, you could manufacture your own duplicate chip, or simulate it in software. The resources required to decap a chip are significant, as you need to shave off tiny layers of the packaging, or use acids to eat away at it, before using a specialist camera to photograph the tiny die at high resolution. And once that's done, you have a jigsaw that consists of thousands upon thousands (or even millions) of transistors that need to be traced out and joined up.

Once you know the exact structure of the chip you're trying to emulate, you then need to simulate the behaviour of every transistor in the chip. And, depending on the accuracy you're trying to achieve, you might eventually be attempting to simulate the behaviour of electrons themselves. At this small scale, we're into the territory of quantum behaviour, too.

At which point the question rises: how accurate do you want your emulator to be?

*These problems affect backward compatibility as well as emulation. If you write code that bypasses the OS and addresses the sound chip directly, then it may not work if the manufacturer uses a different sound chip in the next model, or simply moves it to a different memory location.

answered yesterday

Kaz

2,716945

add a comment |

Aside from the great points made previously, emulating on anything but dedicated hardware runs into real-time OS issues. All current consumer operating systems do not operate in strict real time: your tasks get access to resources as the OS sees fit. This is usually fairly prompt, but sometimes the OS might cause a glitch while it's busy doing something else.

Considering some retrogamers avoid LCD monitors as they have (apparently) have a noticeable latency, OS-induced timing issues can be a major issue.

answered yesterday

scruss

7,66611450

add a comment |

Usually, it's how things should work ideally in digital domain, and how they work in the real physical world of analog domain.

For example, take the SID audio chip from Commodore 64. It has reasonable documentation how it works, and people have been reverse-engineering how it produces sound so we should be able to write a model of it that runs in a FPGA that runs exactly like the original chip would, clock cycle by clock cycle. Or write matching program that runs in a computer.

But that is only the digital logic portion, and audio signals are converted from digital bits to analog voltages inside the chip, and that is by no means an ideal process. It gets noise from nearby signals and may have non-linear distortion. Analog signals have limited speed when transitioning from one voltage to another. The SID chip even has analog filters built in, and external capacitors are needed to set filter operating parameters. The chips can have large manufacturing tolerances in their parameters, and for example the digital control of analogue filter is said to vary a lot, and while documentation says it is linear, it may actually be logarithmic control, and the range may vary just due to chip manufacturing tolerances.

So if the digital chip runs at (approximately) 1MHz clock, the emulation of the analog signal chain inside and outside the chip properly may require much smaller timescale for good results, so it requires faster computer to emulate that. Even if the analog audio would be emulated at the (approximately) 1MHz clock, it is still necessary to convert it to for example 48kHz for playback on your computer. There are various ways to do the resampling, for example just skipping samples would work very fast but would sound awful. The theoretically correct way to do it would be too slow. As it is always a tradeoff, some simplifications may be necessary to have a balance between not too slow and not too bad sounding.

And that is only audio chip. Imagine emulating a chip that outputs analog video. The problem is much more undefined, as sometimes it is not only what the video chip outputs, but how a color TV would actually decode the colors from it, so the problem is more like what you would see on a color TV when fed with the generated signal. Best example is how to show about 4000 artifact colors on an NTSC TV, from a composite output signal generated with CGA video adapter having max 16 colors.

When computer speeds become faster and GPU acceleration can be used, maybe we can skip all that and stop thinking how to emulate some logic that is on a silicon die, maybe we can just scan the structures on silicon and do a purely electrical simulation on the semiconductor structures, so we don't have to know what it is that we are emulating, but it would just work.

answered yesterday

Justme

4873

add a comment |

I wrote my own ZX Spectrum emulator with machine cycle level code timing and 100% ZEXALL pass so here some insights:

Undefined behavior

back in the days there where no sophisticated compilers out there like we have now. They just compiled your code into binary ... So here where a lot of hidden bugs in the code even if the program worked fine that nowadays compilers would complain about right away.

I still remember my surprise while porting my old code to newer compilers and suddenly see the bugs inside hidden for decades ...

On top of all this the ICs used have undefined behavior too. Like the instruction decoders have empty entries, certain settings lead to different results that are not described by the datasheet etc ...

once these two combined suddenly programs perfectly working on native platform are glitching on emulation that upholds the standard to 100%

to remedy that we need to emulate also the undefined stuff properly which is hard as it usually need to remember previous states of the HW (like undefined CPU flags, certain memory locations etc).

As the stuff is undefined we need to research it first on real HW and that usually takes years...

This leads to related stuff/issues like:
- floating bus
- undefined flags
- undefined instructions
- timing glitches

Timing

The easiest and fastest way to properly scale the emulation speed is clock tics and is used by most emulators. But this makes a big problems emulating stuff like:
- IC interconnection
- parallelism
- sharing
It usually boils down to a common therm: contention. With clock tics the emulation has absolutely no correlation with contention and you need to emulate it by hard-coded tables usually on per instruction manner.

There are also other methods of scaling time. I am using machine cycle level timing which means my instructions are divided to basic CPU operations like read memory, read IO, write memory ... just like the native CPU would do. This solves a lot of contention problems on its own as the emulation have the state of the buses in the specific times instead of just output of the instruction... However such timing is much slower as it have bigger overhead.

For more info see Question about cycle counting accuracy when emulating a CPU

HW quality

most emulators fake peripherials by cheap hacks. For example Screen is an image, but on old system they would be displayed on a CRT and there where many effects that exploit that like doubling the y resolution etc ... The same goes for video signal generation If not emulated properly (handled as just a frame instead) the output would be not correct and many effect would not work as should (you know border effects,noise snow,blurs etc).

This goes for any HW like FDC/FDD, tapes, keyboards etc ... so with cheap hacks will no custom loader work, ...

Once putting all this together its a lot of stuff to code and research even for simple platform. As the emulators are usually a hobby projects there is not the time/manpower/will to do so and compromises are taken ...

Recently there is new way of emulating that is 100% HW correct but its still very slow and may be some dedicated emulating HW engine would be a good start point. The idea is take an image of an die of emulated IC and emulate the semiconductor itself. For more info see:

Visual 6502

edited 7 hours ago

answered 19 hours ago

Spektre

3,258618

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "648"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f10828%2fwhat-makes-accurate-emulation-of-old-systems-a-difficult-task%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

9 Answers
9

active

oldest

votes

9 Answers
9

active

oldest

votes

Speaking from my personal experience of writing a PET emulator, a C64 emulator and a Sinclair Spectrum emulator,, here are the issues I had:

Getting the Speed Right

Parallelism

Documentation

Graphics

Sound

Development Tools

Responses to Comments

I feel [your answer] understates how big of an issue parallelism is ~ Austin Hemmelgarn

This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision. ~~ Chris Rollins

Don't modern sound cards buffer sound data so you don't have to bit-bang them? ~ snips n snails

Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, ~ Peter Cordes

edited 17 hours ago

answered yesterday

JeremyP

5,71012132

8

This is a good answer, but I feel it understates how big of an issue parallelism is. A lot of systems (especially game consoles) not only have multiple components running in parallel, but often on multiple differing clocks, with completely different code translation required. For example, a Sega Saturn has 9 separate 'core' chips to emulate (2 SH-2 chips, one m68k, one SH-1, two custom video chips, a custom synth/DSP for audio, and a custom MCU mediating all the busses), covering 5 differing instruction sets running at four different clock speeds.

– Austin Hemmelgarn
yesterday

4

You're wrong about something important here. "having a clock with microsecond accuracy. Such things do not really exist in modern general purpose PC operating systems. Your CPU thread just has to be scheduled out of the processor and it will miss several microseconds probably." <- This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision.

– Chris Rollins
yesterday

1

Could it be practical to handle sound (and possibly other bit-banging effects) by recording changes to I/O registers (with cycle counts for relative timing) while running the emulated code for a frame, and then "play them back" to reconstruct the 1/60 second audio waveform (and possibly any graphical effects) that those changes would have produced? This would cause the audio to be delayed slightly, but at worst it would be by one frame, which shouldn't be too noticeable (especially if the graphics are delayed by the same time as well).

– Ilmari Karonen
yesterday

4

Speaking of Space Invaders, on the original the reason the enemies got faster when there were less was because drawing them all on the screen took less time (not because the game was designed to do so, but the designer intentionally didn't take it out). So not only do you want the game to run at the correct speed you would want to ensure it would still properly speed up towards the end.

– Captain Man
yesterday

2

Your description of cache vs. multithreading is based on a common misconception that other cores can still load "stale" values from their cache after another core modifies its cached copy of a line. Actually CPUs use (a variant of) MESI to ensure cache coherency. See Is mov + mfence safe on NUMA?. The actual problem is memory reordering introduced by the store buffer, before stores have committed to L1d cache. Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, so you only get acq/rel not seq_cst

– Peter Cordes
yesterday

|
show 10 more comments

Speaking from my personal experience of writing a PET emulator, a C64 emulator and a Sinclair Spectrum emulator,, here are the issues I had:

Getting the Speed Right

Parallelism

Documentation

Graphics

Sound

Development Tools

Responses to Comments

I feel [your answer] understates how big of an issue parallelism is ~ Austin Hemmelgarn

This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision. ~~ Chris Rollins

Don't modern sound cards buffer sound data so you don't have to bit-bang them? ~ snips n snails

Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, ~ Peter Cordes

edited 17 hours ago

answered yesterday

JeremyP

5,71012132

8

This is a good answer, but I feel it understates how big of an issue parallelism is. A lot of systems (especially game consoles) not only have multiple components running in parallel, but often on multiple differing clocks, with completely different code translation required. For example, a Sega Saturn has 9 separate 'core' chips to emulate (2 SH-2 chips, one m68k, one SH-1, two custom video chips, a custom synth/DSP for audio, and a custom MCU mediating all the busses), covering 5 differing instruction sets running at four different clock speeds.

– Austin Hemmelgarn
yesterday

4

You're wrong about something important here. "having a clock with microsecond accuracy. Such things do not really exist in modern general purpose PC operating systems. Your CPU thread just has to be scheduled out of the processor and it will miss several microseconds probably." <- This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision.

– Chris Rollins
yesterday

1

Could it be practical to handle sound (and possibly other bit-banging effects) by recording changes to I/O registers (with cycle counts for relative timing) while running the emulated code for a frame, and then "play them back" to reconstruct the 1/60 second audio waveform (and possibly any graphical effects) that those changes would have produced? This would cause the audio to be delayed slightly, but at worst it would be by one frame, which shouldn't be too noticeable (especially if the graphics are delayed by the same time as well).

– Ilmari Karonen
yesterday

4

Speaking of Space Invaders, on the original the reason the enemies got faster when there were less was because drawing them all on the screen took less time (not because the game was designed to do so, but the designer intentionally didn't take it out). So not only do you want the game to run at the correct speed you would want to ensure it would still properly speed up towards the end.

– Captain Man
yesterday

2

Your description of cache vs. multithreading is based on a common misconception that other cores can still load "stale" values from their cache after another core modifies its cached copy of a line. Actually CPUs use (a variant of) MESI to ensure cache coherency. See Is mov + mfence safe on NUMA?. The actual problem is memory reordering introduced by the store buffer, before stores have committed to L1d cache. Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, so you only get acq/rel not seq_cst

– Peter Cordes
yesterday

|
show 10 more comments

Speaking from my personal experience of writing a PET emulator, a C64 emulator and a Sinclair Spectrum emulator,, here are the issues I had:

Getting the Speed Right

Parallelism

Documentation

Graphics

Sound

Development Tools

Responses to Comments

I feel [your answer] understates how big of an issue parallelism is ~ Austin Hemmelgarn

This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision. ~~ Chris Rollins

Don't modern sound cards buffer sound data so you don't have to bit-bang them? ~ snips n snails

Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, ~ Peter Cordes

edited 17 hours ago

answered yesterday

JeremyP

5,71012132

Speaking from my personal experience of writing a PET emulator, a C64 emulator and a Sinclair Spectrum emulator,, here are the issues I had:

Getting the Speed Right

Parallelism

Documentation

Graphics

Sound

Development Tools

Responses to Comments

I feel [your answer] understates how big of an issue parallelism is ~ Austin Hemmelgarn

This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision. ~~ Chris Rollins

Don't modern sound cards buffer sound data so you don't have to bit-bang them? ~ snips n snails

Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, ~ Peter Cordes

edited 17 hours ago

answered yesterday

JeremyP

5,71012132

edited 17 hours ago

answered yesterday

JeremyP

5,71012132

answered yesterday

JeremyP

5,71012132

answered yesterday

JeremyP

5,71012132

8

This is a good answer, but I feel it understates how big of an issue parallelism is. A lot of systems (especially game consoles) not only have multiple components running in parallel, but often on multiple differing clocks, with completely different code translation required. For example, a Sega Saturn has 9 separate 'core' chips to emulate (2 SH-2 chips, one m68k, one SH-1, two custom video chips, a custom synth/DSP for audio, and a custom MCU mediating all the busses), covering 5 differing instruction sets running at four different clock speeds.

– Austin Hemmelgarn
yesterday

4

You're wrong about something important here. "having a clock with microsecond accuracy. Such things do not really exist in modern general purpose PC operating systems. Your CPU thread just has to be scheduled out of the processor and it will miss several microseconds probably." <- This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision.

– Chris Rollins
yesterday

1

Could it be practical to handle sound (and possibly other bit-banging effects) by recording changes to I/O registers (with cycle counts for relative timing) while running the emulated code for a frame, and then "play them back" to reconstruct the 1/60 second audio waveform (and possibly any graphical effects) that those changes would have produced? This would cause the audio to be delayed slightly, but at worst it would be by one frame, which shouldn't be too noticeable (especially if the graphics are delayed by the same time as well).

– Ilmari Karonen
yesterday

4

Speaking of Space Invaders, on the original the reason the enemies got faster when there were less was because drawing them all on the screen took less time (not because the game was designed to do so, but the designer intentionally didn't take it out). So not only do you want the game to run at the correct speed you would want to ensure it would still properly speed up towards the end.

– Captain Man
yesterday

2

Your description of cache vs. multithreading is based on a common misconception that other cores can still load "stale" values from their cache after another core modifies its cached copy of a line. Actually CPUs use (a variant of) MESI to ensure cache coherency. See Is mov + mfence safe on NUMA?. The actual problem is memory reordering introduced by the store buffer, before stores have committed to L1d cache. Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, so you only get acq/rel not seq_cst

– Peter Cordes
yesterday

|
show 10 more comments

8

This is a good answer, but I feel it understates how big of an issue parallelism is. A lot of systems (especially game consoles) not only have multiple components running in parallel, but often on multiple differing clocks, with completely different code translation required. For example, a Sega Saturn has 9 separate 'core' chips to emulate (2 SH-2 chips, one m68k, one SH-1, two custom video chips, a custom synth/DSP for audio, and a custom MCU mediating all the busses), covering 5 differing instruction sets running at four different clock speeds.

– Austin Hemmelgarn
yesterday

4

You're wrong about something important here. "having a clock with microsecond accuracy. Such things do not really exist in modern general purpose PC operating systems. Your CPU thread just has to be scheduled out of the processor and it will miss several microseconds probably." <- This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision.

– Chris Rollins
yesterday

1

Could it be practical to handle sound (and possibly other bit-banging effects) by recording changes to I/O registers (with cycle counts for relative timing) while running the emulated code for a frame, and then "play them back" to reconstruct the 1/60 second audio waveform (and possibly any graphical effects) that those changes would have produced? This would cause the audio to be delayed slightly, but at worst it would be by one frame, which shouldn't be too noticeable (especially if the graphics are delayed by the same time as well).

– Ilmari Karonen
yesterday

4

Speaking of Space Invaders, on the original the reason the enemies got faster when there were less was because drawing them all on the screen took less time (not because the game was designed to do so, but the designer intentionally didn't take it out). So not only do you want the game to run at the correct speed you would want to ensure it would still properly speed up towards the end.

– Captain Man
yesterday

2

Your description of cache vs. multithreading is based on a common misconception that other cores can still load "stale" values from their cache after another core modifies its cached copy of a line. Actually CPUs use (a variant of) MESI to ensure cache coherency. See Is mov + mfence safe on NUMA?. The actual problem is memory reordering introduced by the store buffer, before stores have committed to L1d cache. Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, so you only get acq/rel not seq_cst

– Peter Cordes
yesterday

This is a good answer, but I feel it understates how big of an issue parallelism is. A lot of systems (especially game consoles) not only have multiple components running in parallel, but often on multiple differing clocks, with completely different code translation required. For example, a Sega Saturn has 9 separate 'core' chips to emulate (2 SH-2 chips, one m68k, one SH-1, two custom video chips, a custom synth/DSP for audio, and a custom MCU mediating all the busses), covering 5 differing instruction sets running at four different clock speeds.

– Austin Hemmelgarn
yesterday

You're wrong about something important here. "having a clock with microsecond accuracy. Such things do not really exist in modern general purpose PC operating systems. Your CPU thread just has to be scheduled out of the processor and it will miss several microseconds probably." <- This is only true if you decide to suspend your thread. And you actually miss somewhere around 1 millisecond which makes thread sleeping very undesirable for game dev in general. You can create your desired delay without releasing the thread. This allows you to get microsecond precision.

– Chris Rollins
yesterday

Could it be practical to handle sound (and possibly other bit-banging effects) by recording changes to I/O registers (with cycle counts for relative timing) while running the emulated code for a frame, and then "play them back" to reconstruct the 1/60 second audio waveform (and possibly any graphical effects) that those changes would have produced? This would cause the audio to be delayed slightly, but at worst it would be by one frame, which shouldn't be too noticeable (especially if the graphics are delayed by the same time as well).

– Ilmari Karonen
yesterday

Speaking of Space Invaders, on the original the reason the enemies got faster when there were less was because drawing them all on the screen took less time (not because the game was designed to do so, but the designer intentionally didn't take it out). So not only do you want the game to run at the correct speed you would want to ensure it would still properly speed up towards the end.

– Captain Man
yesterday

Your description of cache vs. multithreading is based on a common misconception that other cores can still load "stale" values from their cache after another core modifies its cached copy of a line. Actually CPUs use (a variant of) MESI to ensure cache coherency. See Is mov + mfence safe on NUMA?. The actual problem is memory reordering introduced by the store buffer, before stores have committed to L1d cache. Without a barrier, your later loads don't wait for your earlier stores to be visible to other threads, so you only get acq/rel not seq_cst

– Peter Cordes
yesterday

|
show 10 more comments

The best description of this problem that I've seen was written by Byuu, one of the developers of the bsnes emulator. That's about as authoritative as they come, but I'll try to summarize it here.

answered yesterday

bta

3113

New contributor

4

+1 for the best answer. I had this exact article in mind when I saw the question.

– Mason Wheeler
yesterday

"Everything was written from complete scratch and ran on the bare metal. The code was deeply tied into the raw hardware, including aspects of the hardware that were unintentional." - exactly this.

– Stilez
yesterday

1

Standard note of caution: the linked article is an individual using a public platform to try to explain to an audience what he thinks are his personal achievements and the motivations for thrm. So although it's better than, say, a politician's Twitter feed, because there was an editor involved, it's still not especially objective.

– Tommy
yesterday

'proprietary' strikes again, +1

– Mazura
23 hours ago

add a comment |

The best description of this problem that I've seen was written by Byuu, one of the developers of the bsnes emulator. That's about as authoritative as they come, but I'll try to summarize it here.

answered yesterday

bta

3113

New contributor

4

+1 for the best answer. I had this exact article in mind when I saw the question.

– Mason Wheeler
yesterday

"Everything was written from complete scratch and ran on the bare metal. The code was deeply tied into the raw hardware, including aspects of the hardware that were unintentional." - exactly this.

– Stilez
yesterday

1

Standard note of caution: the linked article is an individual using a public platform to try to explain to an audience what he thinks are his personal achievements and the motivations for thrm. So although it's better than, say, a politician's Twitter feed, because there was an editor involved, it's still not especially objective.

– Tommy
yesterday

'proprietary' strikes again, +1

– Mazura
23 hours ago

add a comment |

The best description of this problem that I've seen was written by Byuu, one of the developers of the bsnes emulator. That's about as authoritative as they come, but I'll try to summarize it here.

answered yesterday

bta

3113

New contributor

The best description of this problem that I've seen was written by Byuu, one of the developers of the bsnes emulator. That's about as authoritative as they come, but I'll try to summarize it here.

answered yesterday

bta

3113

New contributor

answered yesterday

bta

3113

New contributor

answered yesterday

bta

3113

answered yesterday

bta

3113

New contributor

bta is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

4

+1 for the best answer. I had this exact article in mind when I saw the question.

– Mason Wheeler
yesterday

"Everything was written from complete scratch and ran on the bare metal. The code was deeply tied into the raw hardware, including aspects of the hardware that were unintentional." - exactly this.

– Stilez
yesterday

1

Standard note of caution: the linked article is an individual using a public platform to try to explain to an audience what he thinks are his personal achievements and the motivations for thrm. So although it's better than, say, a politician's Twitter feed, because there was an editor involved, it's still not especially objective.

– Tommy
yesterday

'proprietary' strikes again, +1

– Mazura
23 hours ago

add a comment |

4

+1 for the best answer. I had this exact article in mind when I saw the question.

– Mason Wheeler
yesterday

"Everything was written from complete scratch and ran on the bare metal. The code was deeply tied into the raw hardware, including aspects of the hardware that were unintentional." - exactly this.

– Stilez
yesterday

1

Standard note of caution: the linked article is an individual using a public platform to try to explain to an audience what he thinks are his personal achievements and the motivations for thrm. So although it's better than, say, a politician's Twitter feed, because there was an editor involved, it's still not especially objective.

– Tommy
yesterday

'proprietary' strikes again, +1

– Mazura
23 hours ago

+1 for the best answer. I had this exact article in mind when I saw the question.

– Mason Wheeler
yesterday

"Everything was written from complete scratch and ran on the bare metal. The code was deeply tied into the raw hardware, including aspects of the hardware that were unintentional." - exactly this.

– Stilez
yesterday

Standard note of caution: the linked article is an individual using a public platform to try to explain to an audience what he thinks are his personal achievements and the motivations for thrm. So although it's better than, say, a politician's Twitter feed, because there was an editor involved, it's still not especially objective.

– Tommy
yesterday

'proprietary' strikes again, +1

– Mazura
23 hours ago

add a comment |

Almost all computer systems have multiple different devices operating in parallel. For example, the CPU is typically running parallel to the video generation hardware.

answered yesterday

user

4,746922

"It gets even worse when you consider systems that had build in speakers or displays, but to sound/look right the properties of those must also be considered." - one example is the original non-backlit GameBoy Advance, which caused colours to be blurry and less bright, so some GBA emulators have a setting to make the colours less colourful and blurrier. Even Nintendo's official hardware had this problem with the successors that had backlit screens, and they didn't bother to correct it.

– immibis
yesterday

2

@immibis, a better example would be NTSC artifact colors: rapid brightness changes in the framebuffer turn into color changes on the screen.

– Mark
yesterday

add a comment |

Almost all computer systems have multiple different devices operating in parallel. For example, the CPU is typically running parallel to the video generation hardware.

answered yesterday

user

4,746922

"It gets even worse when you consider systems that had build in speakers or displays, but to sound/look right the properties of those must also be considered." - one example is the original non-backlit GameBoy Advance, which caused colours to be blurry and less bright, so some GBA emulators have a setting to make the colours less colourful and blurrier. Even Nintendo's official hardware had this problem with the successors that had backlit screens, and they didn't bother to correct it.

– immibis
yesterday

2

@immibis, a better example would be NTSC artifact colors: rapid brightness changes in the framebuffer turn into color changes on the screen.

– Mark
yesterday

add a comment |

Almost all computer systems have multiple different devices operating in parallel. For example, the CPU is typically running parallel to the video generation hardware.

answered yesterday

user

4,746922

Almost all computer systems have multiple different devices operating in parallel. For example, the CPU is typically running parallel to the video generation hardware.

answered yesterday

user

4,746922

answered yesterday

user

4,746922

answered yesterday

user

4,746922

answered yesterday

user

4,746922

"It gets even worse when you consider systems that had build in speakers or displays, but to sound/look right the properties of those must also be considered." - one example is the original non-backlit GameBoy Advance, which caused colours to be blurry and less bright, so some GBA emulators have a setting to make the colours less colourful and blurrier. Even Nintendo's official hardware had this problem with the successors that had backlit screens, and they didn't bother to correct it.

– immibis
yesterday

2

@immibis, a better example would be NTSC artifact colors: rapid brightness changes in the framebuffer turn into color changes on the screen.

– Mark
yesterday

add a comment |

"It gets even worse when you consider systems that had build in speakers or displays, but to sound/look right the properties of those must also be considered." - one example is the original non-backlit GameBoy Advance, which caused colours to be blurry and less bright, so some GBA emulators have a setting to make the colours less colourful and blurrier. Even Nintendo's official hardware had this problem with the successors that had backlit screens, and they didn't bother to correct it.

– immibis
yesterday

2

@immibis, a better example would be NTSC artifact colors: rapid brightness changes in the framebuffer turn into color changes on the screen.

– Mark
yesterday

"It gets even worse when you consider systems that had build in speakers or displays, but to sound/look right the properties of those must also be considered." - one example is the original non-backlit GameBoy Advance, which caused colours to be blurry and less bright, so some GBA emulators have a setting to make the colours less colourful and blurrier. Even Nintendo's official hardware had this problem with the successors that had backlit screens, and they didn't bother to correct it.

– immibis
yesterday

@immibis, a better example would be NTSC artifact colors: rapid brightness changes in the framebuffer turn into color changes on the screen.

– Mark
yesterday

add a comment |

I'm the author of an emulator of a whole bunch of platforms, so I'm going to go ahead and commit a heresy: accurate emulation of old systems isn't a difficult task. It's just an expensive one.

The term is primarily marketing puff but this is the sort of thing that scenesters refer to as 'cycle accuracy'.

answered yesterday

Tommy

16.7k14883

add a comment |

I'm the author of an emulator of a whole bunch of platforms, so I'm going to go ahead and commit a heresy: accurate emulation of old systems isn't a difficult task. It's just an expensive one.

The term is primarily marketing puff but this is the sort of thing that scenesters refer to as 'cycle accuracy'.

answered yesterday

Tommy

16.7k14883

add a comment |

I'm the author of an emulator of a whole bunch of platforms, so I'm going to go ahead and commit a heresy: accurate emulation of old systems isn't a difficult task. It's just an expensive one.

The term is primarily marketing puff but this is the sort of thing that scenesters refer to as 'cycle accuracy'.

answered yesterday

Tommy

16.7k14883

I'm the author of an emulator of a whole bunch of platforms, so I'm going to go ahead and commit a heresy: accurate emulation of old systems isn't a difficult task. It's just an expensive one.

The term is primarily marketing puff but this is the sort of thing that scenesters refer to as 'cycle accuracy'.

answered yesterday

Tommy

16.7k14883

answered yesterday

Tommy

16.7k14883

answered yesterday

Tommy

16.7k14883

answered yesterday

Tommy

16.7k14883

add a comment |

answered yesterday

Michael Karcher

1834

New contributor

add a comment |

answered yesterday

Michael Karcher

1834

New contributor

add a comment |

answered yesterday

Michael Karcher

1834

New contributor

answered yesterday

Michael Karcher

1834

New contributor

answered yesterday

Michael Karcher

1834

New contributor

answered yesterday

Michael Karcher

1834

answered yesterday

Michael Karcher

1834

New contributor

Michael Karcher is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

At which point the question rises: how accurate do you want your emulator to be?

answered yesterday

Kaz

2,716945

add a comment |

At which point the question rises: how accurate do you want your emulator to be?

answered yesterday

Kaz

2,716945

add a comment |

At which point the question rises: how accurate do you want your emulator to be?

answered yesterday

Kaz

2,716945

At which point the question rises: how accurate do you want your emulator to be?

answered yesterday

Kaz

2,716945

answered yesterday

Kaz

2,716945

answered yesterday

Kaz

2,716945

answered yesterday

Kaz

2,716945

add a comment |

Considering some retrogamers avoid LCD monitors as they have (apparently) have a noticeable latency, OS-induced timing issues can be a major issue.

answered yesterday

scruss

7,66611450

add a comment |

Considering some retrogamers avoid LCD monitors as they have (apparently) have a noticeable latency, OS-induced timing issues can be a major issue.

answered yesterday

scruss

7,66611450

add a comment |

Considering some retrogamers avoid LCD monitors as they have (apparently) have a noticeable latency, OS-induced timing issues can be a major issue.

answered yesterday

scruss

7,66611450

Considering some retrogamers avoid LCD monitors as they have (apparently) have a noticeable latency, OS-induced timing issues can be a major issue.

answered yesterday

scruss

7,66611450

answered yesterday

scruss

7,66611450

answered yesterday

scruss

7,66611450

answered yesterday

scruss

7,66611450

add a comment |

Usually, it's how things should work ideally in digital domain, and how they work in the real physical world of analog domain.

answered yesterday

Justme

4873

add a comment |

Usually, it's how things should work ideally in digital domain, and how they work in the real physical world of analog domain.

answered yesterday

Justme

4873

add a comment |

Usually, it's how things should work ideally in digital domain, and how they work in the real physical world of analog domain.

answered yesterday

Justme

4873

Usually, it's how things should work ideally in digital domain, and how they work in the real physical world of analog domain.

answered yesterday

Justme

4873

answered yesterday

Justme

4873

answered yesterday

Justme

4873

answered yesterday

Justme

4873

add a comment |

I wrote my own ZX Spectrum emulator with machine cycle level code timing and 100% ZEXALL pass so here some insights:

Undefined behavior

back in the days there where no sophisticated compilers out there like we have now. They just compiled your code into binary ... So here where a lot of hidden bugs in the code even if the program worked fine that nowadays compilers would complain about right away.

I still remember my surprise while porting my old code to newer compilers and suddenly see the bugs inside hidden for decades ...

On top of all this the ICs used have undefined behavior too. Like the instruction decoders have empty entries, certain settings lead to different results that are not described by the datasheet etc ...

once these two combined suddenly programs perfectly working on native platform are glitching on emulation that upholds the standard to 100%

to remedy that we need to emulate also the undefined stuff properly which is hard as it usually need to remember previous states of the HW (like undefined CPU flags, certain memory locations etc).

As the stuff is undefined we need to research it first on real HW and that usually takes years...

This leads to related stuff/issues like:
- floating bus
- undefined flags
- undefined instructions
- timing glitches

Timing

The easiest and fastest way to properly scale the emulation speed is clock tics and is used by most emulators. But this makes a big problems emulating stuff like:
- IC interconnection
- parallelism
- sharing
It usually boils down to a common therm: contention. With clock tics the emulation has absolutely no correlation with contention and you need to emulate it by hard-coded tables usually on per instruction manner.

There are also other methods of scaling time. I am using machine cycle level timing which means my instructions are divided to basic CPU operations like read memory, read IO, write memory ... just like the native CPU would do. This solves a lot of contention problems on its own as the emulation have the state of the buses in the specific times instead of just output of the instruction... However such timing is much slower as it have bigger overhead.

For more info see Question about cycle counting accuracy when emulating a CPU

HW quality

most emulators fake peripherials by cheap hacks. For example Screen is an image, but on old system they would be displayed on a CRT and there where many effects that exploit that like doubling the y resolution etc ... The same goes for video signal generation If not emulated properly (handled as just a frame instead) the output would be not correct and many effect would not work as should (you know border effects,noise snow,blurs etc).

This goes for any HW like FDC/FDD, tapes, keyboards etc ... so with cheap hacks will no custom loader work, ...

Once putting all this together its a lot of stuff to code and research even for simple platform. As the emulators are usually a hobby projects there is not the time/manpower/will to do so and compromises are taken ...

Visual 6502

edited 7 hours ago

answered 19 hours ago

Spektre

3,258618

add a comment |

I wrote my own ZX Spectrum emulator with machine cycle level code timing and 100% ZEXALL pass so here some insights:

Undefined behavior

back in the days there where no sophisticated compilers out there like we have now. They just compiled your code into binary ... So here where a lot of hidden bugs in the code even if the program worked fine that nowadays compilers would complain about right away.

I still remember my surprise while porting my old code to newer compilers and suddenly see the bugs inside hidden for decades ...

On top of all this the ICs used have undefined behavior too. Like the instruction decoders have empty entries, certain settings lead to different results that are not described by the datasheet etc ...

once these two combined suddenly programs perfectly working on native platform are glitching on emulation that upholds the standard to 100%

to remedy that we need to emulate also the undefined stuff properly which is hard as it usually need to remember previous states of the HW (like undefined CPU flags, certain memory locations etc).

As the stuff is undefined we need to research it first on real HW and that usually takes years...

This leads to related stuff/issues like:
- floating bus
- undefined flags
- undefined instructions
- timing glitches

Timing

The easiest and fastest way to properly scale the emulation speed is clock tics and is used by most emulators. But this makes a big problems emulating stuff like:
- IC interconnection
- parallelism
- sharing
It usually boils down to a common therm: contention. With clock tics the emulation has absolutely no correlation with contention and you need to emulate it by hard-coded tables usually on per instruction manner.

There are also other methods of scaling time. I am using machine cycle level timing which means my instructions are divided to basic CPU operations like read memory, read IO, write memory ... just like the native CPU would do. This solves a lot of contention problems on its own as the emulation have the state of the buses in the specific times instead of just output of the instruction... However such timing is much slower as it have bigger overhead.

For more info see Question about cycle counting accuracy when emulating a CPU

HW quality

most emulators fake peripherials by cheap hacks. For example Screen is an image, but on old system they would be displayed on a CRT and there where many effects that exploit that like doubling the y resolution etc ... The same goes for video signal generation If not emulated properly (handled as just a frame instead) the output would be not correct and many effect would not work as should (you know border effects,noise snow,blurs etc).

This goes for any HW like FDC/FDD, tapes, keyboards etc ... so with cheap hacks will no custom loader work, ...

Once putting all this together its a lot of stuff to code and research even for simple platform. As the emulators are usually a hobby projects there is not the time/manpower/will to do so and compromises are taken ...

Visual 6502

edited 7 hours ago

answered 19 hours ago

Spektre

3,258618

add a comment |

I wrote my own ZX Spectrum emulator with machine cycle level code timing and 100% ZEXALL pass so here some insights:

Undefined behavior

back in the days there where no sophisticated compilers out there like we have now. They just compiled your code into binary ... So here where a lot of hidden bugs in the code even if the program worked fine that nowadays compilers would complain about right away.

I still remember my surprise while porting my old code to newer compilers and suddenly see the bugs inside hidden for decades ...

On top of all this the ICs used have undefined behavior too. Like the instruction decoders have empty entries, certain settings lead to different results that are not described by the datasheet etc ...

once these two combined suddenly programs perfectly working on native platform are glitching on emulation that upholds the standard to 100%

to remedy that we need to emulate also the undefined stuff properly which is hard as it usually need to remember previous states of the HW (like undefined CPU flags, certain memory locations etc).

As the stuff is undefined we need to research it first on real HW and that usually takes years...

This leads to related stuff/issues like:
- floating bus
- undefined flags
- undefined instructions
- timing glitches

Timing

The easiest and fastest way to properly scale the emulation speed is clock tics and is used by most emulators. But this makes a big problems emulating stuff like:
- IC interconnection
- parallelism
- sharing
It usually boils down to a common therm: contention. With clock tics the emulation has absolutely no correlation with contention and you need to emulate it by hard-coded tables usually on per instruction manner.

There are also other methods of scaling time. I am using machine cycle level timing which means my instructions are divided to basic CPU operations like read memory, read IO, write memory ... just like the native CPU would do. This solves a lot of contention problems on its own as the emulation have the state of the buses in the specific times instead of just output of the instruction... However such timing is much slower as it have bigger overhead.

For more info see Question about cycle counting accuracy when emulating a CPU

HW quality

most emulators fake peripherials by cheap hacks. For example Screen is an image, but on old system they would be displayed on a CRT and there where many effects that exploit that like doubling the y resolution etc ... The same goes for video signal generation If not emulated properly (handled as just a frame instead) the output would be not correct and many effect would not work as should (you know border effects,noise snow,blurs etc).

This goes for any HW like FDC/FDD, tapes, keyboards etc ... so with cheap hacks will no custom loader work, ...

Once putting all this together its a lot of stuff to code and research even for simple platform. As the emulators are usually a hobby projects there is not the time/manpower/will to do so and compromises are taken ...

Visual 6502

edited 7 hours ago

answered 19 hours ago

Spektre

3,258618

I wrote my own ZX Spectrum emulator with machine cycle level code timing and 100% ZEXALL pass so here some insights:

Undefined behavior

back in the days there where no sophisticated compilers out there like we have now. They just compiled your code into binary ... So here where a lot of hidden bugs in the code even if the program worked fine that nowadays compilers would complain about right away.

I still remember my surprise while porting my old code to newer compilers and suddenly see the bugs inside hidden for decades ...

On top of all this the ICs used have undefined behavior too. Like the instruction decoders have empty entries, certain settings lead to different results that are not described by the datasheet etc ...

once these two combined suddenly programs perfectly working on native platform are glitching on emulation that upholds the standard to 100%

to remedy that we need to emulate also the undefined stuff properly which is hard as it usually need to remember previous states of the HW (like undefined CPU flags, certain memory locations etc).

As the stuff is undefined we need to research it first on real HW and that usually takes years...

This leads to related stuff/issues like:
- floating bus
- undefined flags
- undefined instructions
- timing glitches

Timing

The easiest and fastest way to properly scale the emulation speed is clock tics and is used by most emulators. But this makes a big problems emulating stuff like:
- IC interconnection
- parallelism
- sharing
It usually boils down to a common therm: contention. With clock tics the emulation has absolutely no correlation with contention and you need to emulate it by hard-coded tables usually on per instruction manner.

There are also other methods of scaling time. I am using machine cycle level timing which means my instructions are divided to basic CPU operations like read memory, read IO, write memory ... just like the native CPU would do. This solves a lot of contention problems on its own as the emulation have the state of the buses in the specific times instead of just output of the instruction... However such timing is much slower as it have bigger overhead.

For more info see Question about cycle counting accuracy when emulating a CPU

HW quality

most emulators fake peripherials by cheap hacks. For example Screen is an image, but on old system they would be displayed on a CRT and there where many effects that exploit that like doubling the y resolution etc ... The same goes for video signal generation If not emulated properly (handled as just a frame instead) the output would be not correct and many effect would not work as should (you know border effects,noise snow,blurs etc).

This goes for any HW like FDC/FDD, tapes, keyboards etc ... so with cheap hacks will no custom loader work, ...

Once putting all this together its a lot of stuff to code and research even for simple platform. As the emulators are usually a hobby projects there is not the time/manpower/will to do so and compromises are taken ...

Visual 6502

edited 7 hours ago

answered 19 hours ago

Spektre

3,258618

edited 7 hours ago

answered 19 hours ago

Spektre

3,258618

answered 19 hours ago

Spektre

3,258618

answered 19 hours ago

Spektre

3,258618

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Retrocomputing Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

bWFKlgg,nEFCbMcO,MlA OdmnccV2zWeqOV s ngxeF,GMyO 3LN2aEeUYA AIZB7gmUZN5