Most bit efficient text communication method? Announcing the arrival of Valued Associate #679:...
Compare a given version number in the form major.minor.build.patch and see if one is less than the other
When the Haste spell ends on a creature, do attackers have advantage against that creature?
Is there such thing as an Availability Group failover trigger?
Do square wave exist?
Is this homebrew Lady of Pain warlock patron balanced?
Should I use a zero-interest credit card for a large one-time purchase?
Why wasn't DOSKEY integrated with COMMAND.COM?
How would a mousetrap for use in space work?
Is it a good idea to use CNN to classify 1D signal?
Most bit efficient text communication method?
8 Prisoners wearing hats
First console to have temporary backward compatibility
How do pianists reach extremely loud dynamics?
Significance of Cersei's obsession with elephants?
How to tell that you are a giant?
What would be the ideal power source for a cybernetic eye?
Using audio cues to encourage good posture
If a VARCHAR(MAX) column is included in an index, is the entire value always stored in the index page(s)?
Dating a Former Employee
Can anything be seen from the center of the Boötes void? How dark would it be?
Maximum summed powersets with non-adjacent items
また usage in a dictionary
Amount of permutations on an NxNxN Rubik's Cube
Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?
Most bit efficient text communication method?
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The network's official Twitter account is up and running again. What content…How would aliens recognize communication coming from Earth as communication?Interstellar communicationFar Future: Most plausible everyday communication (device)?Hive mind withdrawal and communication skillsHow to add tactics and maneuvering into space warfareWhat is the best way to classify Intelligence in a world full of various Intellects?How fast can a species communicate using only tapping?Would aliens evolve a method of communication other than speaking?What's the most efficient way to keep everyone informed intergalactically?Natural gas based communication
$begingroup$
What would be the most efficient text communication method for a spacecraft operating on a super low bit rate (I'm talking something like 5 bits an hour, excluding error handling)?
As you want both complexity (full English language and numbers) and speed (letters per day) resorting to something like Morse code seems the most obvious solution but is there any other options out there?
space-travel communication interstellar-travel
$endgroup$
|
show 1 more comment
$begingroup$
What would be the most efficient text communication method for a spacecraft operating on a super low bit rate (I'm talking something like 5 bits an hour, excluding error handling)?
As you want both complexity (full English language and numbers) and speed (letters per day) resorting to something like Morse code seems the most obvious solution but is there any other options out there?
space-travel communication interstellar-travel
$endgroup$
2
$begingroup$
What type of spacecraft are we looking at here, and what data does it need to communicate?
$endgroup$
– Cadence
16 hours ago
15
$begingroup$
What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
$endgroup$
– AlexP
16 hours ago
2
$begingroup$
If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
$endgroup$
– Jack Aidley
13 hours ago
2
$begingroup$
why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
$endgroup$
– opa
10 hours ago
1
$begingroup$
@opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
$endgroup$
– Andrey
6 hours ago
|
show 1 more comment
$begingroup$
What would be the most efficient text communication method for a spacecraft operating on a super low bit rate (I'm talking something like 5 bits an hour, excluding error handling)?
As you want both complexity (full English language and numbers) and speed (letters per day) resorting to something like Morse code seems the most obvious solution but is there any other options out there?
space-travel communication interstellar-travel
$endgroup$
What would be the most efficient text communication method for a spacecraft operating on a super low bit rate (I'm talking something like 5 bits an hour, excluding error handling)?
As you want both complexity (full English language and numbers) and speed (letters per day) resorting to something like Morse code seems the most obvious solution but is there any other options out there?
space-travel communication interstellar-travel
space-travel communication interstellar-travel
asked 16 hours ago
ExostrikeExostrike
36528
36528
2
$begingroup$
What type of spacecraft are we looking at here, and what data does it need to communicate?
$endgroup$
– Cadence
16 hours ago
15
$begingroup$
What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
$endgroup$
– AlexP
16 hours ago
2
$begingroup$
If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
$endgroup$
– Jack Aidley
13 hours ago
2
$begingroup$
why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
$endgroup$
– opa
10 hours ago
1
$begingroup$
@opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
$endgroup$
– Andrey
6 hours ago
|
show 1 more comment
2
$begingroup$
What type of spacecraft are we looking at here, and what data does it need to communicate?
$endgroup$
– Cadence
16 hours ago
15
$begingroup$
What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
$endgroup$
– AlexP
16 hours ago
2
$begingroup$
If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
$endgroup$
– Jack Aidley
13 hours ago
2
$begingroup$
why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
$endgroup$
– opa
10 hours ago
1
$begingroup$
@opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
$endgroup$
– Andrey
6 hours ago
2
2
$begingroup$
What type of spacecraft are we looking at here, and what data does it need to communicate?
$endgroup$
– Cadence
16 hours ago
$begingroup$
What type of spacecraft are we looking at here, and what data does it need to communicate?
$endgroup$
– Cadence
16 hours ago
15
15
$begingroup$
What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
$endgroup$
– AlexP
16 hours ago
$begingroup$
What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
$endgroup$
– AlexP
16 hours ago
2
2
$begingroup$
If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
$endgroup$
– Jack Aidley
13 hours ago
$begingroup$
If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
$endgroup$
– Jack Aidley
13 hours ago
2
2
$begingroup$
why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
$endgroup$
– opa
10 hours ago
$begingroup$
why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
$endgroup$
– opa
10 hours ago
1
1
$begingroup$
@opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
$endgroup$
– Andrey
6 hours ago
$begingroup$
@opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
$endgroup$
– Andrey
6 hours ago
|
show 1 more comment
10 Answers
10
active
oldest
votes
$begingroup$
The most efficient communication is probably a command set. Since you contemplated Morse code, I assume that the communication is done via a fully defined interface - both sender and receiver know what a bit sequence is supposed to mean.
A command set is no more that giving different codes predefined meanings. With one singe bit you can define 2 commands:
| value | meaning |
| 0 | light off |
| 1 | light on |
With 4 bits you can define 15 different commands, with 1 byte (8 bits) 255 commands, with 2 bytes 65535 commands and so on. If all you really need is to display texts to an astronaut, you have to store a bunch of ready made texts like "Activate X-ray sensors" in a database and send the corresponding message ID from Earth. For more complex messages you can store text templates in a database and then compile a message from several templates.
An early real-world example is the list of Q-Codes, created circa 1909, by the British government as a "list of abbreviations... prepared for the use of British ships and coast stations licensed by the Postmaster General".
If you need to communicate more than simple texts, you would separate a message into a command part and a message part. You could, for example, tell the space ship:
Activate X-ray sensors
By sending a signal of 2 bytes:
| byte | value | meaning |
| 1 | 01 | activate appliance |
| 2 | 08 | X-ray sensor array |
Communication with an astronaut would be possible with a different command:
| byte | value | meaning |
| 1 | 04 | write to terminal |
| 2 | 08 | text with ID 8 |
That would result in slightly longer commands, but the possibilities of what you can achieve with a few bytes are multiplied.
If you have a really big database with a whole lot of different texts, it might be more efficient to terminate commands with a defined code. For this approach, the database must be sorted in a way that gives the most frequent commands the lowest ID.
Let's define 0000
as the terminator.
- For a very common command with the ID 6, you need to send the command's ID followed by the terminator:
0110 0000
. - A very uncommon command with the ID 26683 would look like this:
0110 1000 0011 1011 0000
.
The advantage is that you can have commands of dynamic lengths (instead of sending a whole bunch of useless 0's to fill up the static length of a command).
The disadvantage is that every command is longer than it could ideally be. So this approach only gets worthwhile when you need a great many commands.
After defining your command set, the next step is to make sure that you received the correct message. Losing just a single bit can change a message of "Activate X-ray sensors" into "Destroy X-ray sensors" or similar. This is usually done with a checksum, which requires some more bits to transmit.
Have a look at the difference between two common data transmission protocols for the internet: UDP and TCP. UDP is the most efficient in respect to transfer rate, but TCP trades some efficiency for reliability by including some overhead for error checking.
$endgroup$
6
$begingroup$
+1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
$endgroup$
– Joe Bloggs
15 hours ago
19
$begingroup$
So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
$endgroup$
– Chronocidal
15 hours ago
8
$begingroup$
1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
$endgroup$
– genesis
15 hours ago
$begingroup$
You could take AS Interface as a reference. A small scale industrial communication bus system
$endgroup$
– Alexander von Wernherr
14 hours ago
2
$begingroup$
If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
$endgroup$
– FluxIX
12 hours ago
|
show 6 more comments
$begingroup$
According to Schneier the entropy of English text is below 1.6 bit per letter. Given a difficult constraint such as yours I would expect people to come up with compression algorithms getting close to that.
If you don't need the full power of English you might get much better compression if you can pre-define a small set of words that would be sufficient. Something similar in principle to https://xkcd.com/1133/
I think you need to answer two important questions:
- Is the system pre-defined, i.e. can there be word-lists?
- Are characters/words sent individually or can you apply compression to a large amount of data and then send it in bulk?
If you want something that is simple, sciency and requires no setup, go with Huffman-coding individual letters based on frequency in English. ;)
$endgroup$
1
$begingroup$
as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
$endgroup$
– Andrey
6 hours ago
$begingroup$
@Andrey I'm not sure I would call rocket science a "simple idea".
$endgroup$
– TheHansinator
3 hours ago
1
$begingroup$
@Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
$endgroup$
– TheHansinator
3 hours ago
add a comment |
$begingroup$
You might look at digital modes for amateur radio here. Some of those modes use what's called "varicode" -- where different characters have different symbol lengths (Morse code is a varicode system -- more commonly used letters are shorter in terms of transmission time). When sending English text, a varicode will minimize the number of bits required for a sufficiently large sample (which reasonably ought to include a large number of messages). If "text speak" is used commonly, it might make sense to design the varicode used around letter frequencies in that particular text format.
If longer messages are common, some form of compression would make sense -- text typically compresses will with common compression algorithms, but the compression headers make this inefficient for very small blocks of data (text or otherwise).
$endgroup$
add a comment |
$begingroup$
Textspeak
SMS messages originally were 160 characters so textspeak evolved to reduce everything down to the most compact form through abbreviations, acronyms and emoticons.
Sounds like a good reason to send teenagers into space....
$endgroup$
$begingroup$
I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
$endgroup$
– Aaron
8 hours ago
$begingroup$
Was previously beeper speak: 143 133 43 43 5318008
$endgroup$
– RIanGillis
6 hours ago
add a comment |
$begingroup$
Building on other answers
In addition to the different encoding and compression methods, one thing to look into is shorthand techniques that allow you to drop letters while still being able to interpret the message. Some examples:
- it, to -> t
- is -> s
- have -> hv
- cat -> ct
- are -> r
Example sentence: hw r u?
An alternative approach
Encode your information in time delays
Presumably there is some reason that you can't speed up the data transmission, but perhaps you can slow it down. At 5 bits per hour, that's 12 minutes between each bit. Instead of sending each bit at regular interval, you can delay transmission of bits and use the delay time as a means of conveying information.
So let's say you expect a minimum of 12 minutes between each bit, you can encode the data as follows (time is in mm:ss format):
- 12:00 = 0
- 12:05 = 1
- 12:10 = 2
- 12:15 = 3
- etc
The more data you encode, the fewer bits per day you'll be able to transmit, so there will be some optimal balance you'll have to figure out based on the minimum delay interval you consider acceptable. Then you can perhaps use the bits themselves as an error checking mechanism, or to still transmit data.
$endgroup$
add a comment |
$begingroup$
Not Morse code
From Wikipedia:
International Morse code is composed of five elements:[1]
- short mark, dot or "dit" (▄▄▄▄): "dot duration" is one time unit long
- longer mark, dash or "dah" (▄▄▄▄▄▄): three time units long
- inter-element gap between the dots and dashes within a character: one dot duration or one unit long
- short gap (between letters): three time units long
- medium gap (between words): seven time units long
If we use one bit to store one unit of information, it takes four bits to transmit even the shortest letter ('e') and its subsequent gap. The next shortest are 'i' and 't' at six bits. Then 'a', 'n', and 's' at eight. The longest character in the Morse alphabet is 0, which requires five dashes or twenty-two units/bits. And that only supports the thirty-six character latin alphanumeric alphabet.
Morse is designed around humans. Humans do better with indeterminate length than fixed length, as we don't have good timing ability (we can't tell a five unit pause from a four unit pause consistently). But if these messages are being transmitted computer to computer, computers have great ability at timing. We can use superior fixed length formats. Heck, even with humans, twelve minute long units means that it is easy to track whether you're getting a pause or a dot (a zero or a one).
Even worse, if you are transmitting Morse over bits. Because (extended) ASCII's eight bits is more efficient unless the message is composed entirely of 'eitans'.
Bits
Meanwhile, if we transmit ASCII, we could transmit a 0 with eight bits. If we break things into nybbles, we can transmit one nybble with a checksum bit every hour. So two hours to transmit one character with some error detection included. Or ninety-six minutes without the checksums.
If we instead use ten bits (two hours), we can do something like Lempel-Ziv. So the first 256 characters are the extended ASCII set. The remaining 768 symbols actually represent multiple characters. So common sequences (e.g. "the ", "ing", and "tion") would have their own ten-bit representation, e.g. 0100000000. This allows the full flexibility of ASCII while also producing a shorter message on average.
The Lempel-Ziv algorithm builds the dictionary from the message itself. We can do better by agreeing on a dictionary beforehand. You can also use this to integrate the error correction and the dictionary, which improves your effective speed.
Numbers are generally going to be better sent as bits than as characters. I.e. instead of sending ASCII 3840, just send 111100000000. That's only twelve bits, hardly more than a single character.
$endgroup$
add a comment |
$begingroup$
A receiver will pick up the raido signal plus background noise (most notably cosmic background radiation). Generally the received noise power is greater for greater receiver bandwidth. So to get a good signal to noise ratio one can transmit the radio signal within a very narrow frequecy band and put a very narrow band filter on the front of the receiver.
EXAMPLE: The receiver was picking up 1 micro-watt of radio signal and 1 milli-watt of noise power with a 1MHz bandwidth (so a SNR of 0.001).
Droping the bandwith to 10Hz would result in 1 micro watt of radio signal power and 10 nano-watts of received noise power (so a SNR of 100)
Consider a protocol like PSK31 (or similar) used by HAM radios instead of moorse code.
PSK31 uses pure tones of relatively long duration to send 1s and 0s. The longer those tones are the more narrow the filter at the receiver can be. PSKxxx can be used to send low data rate messages across the plannet using only a few watts of power.
Another alternative (though more complex) is using long strings of physical 1s and 0s to represent a single symbol in the protocol. This method is used by GPS for example. The GPS signal is normally about 30X lower power than the background noise, but by correlating long strings of 1024 bits the receiver is able to on average lock onto the signal.
EXAMPLE: Define two long sequences of physical 1s and 0s for each letter of the alphabet. Each code is very different from the other codes.
Let A be 00101010 10001010 10100101 00101010 ...
Let B be 10100001 10100101 00010101 00010100 ...
Let C be 01001010 01010100 00010100 00110101 ...
The sequences may be thousands of bits long if you want. The patterns are generated by a computer automatically when the user types a letter on the keyboard.
The physical bit sequences are sent at a much higher rate than the actual symbols. For example if you want t send one symbol per second and your sequences are 1000 bits long then you send the physical bits at 1000 bits per second.
When receiving the signal + noise; the noise will cause the receiver to make the wrong decision on the physical 1s and 0s some percentage of the time. The receiver stores the received bit pattern and compares it to one of the codes. The receiver then selects the code which most closely matches the received pattern. Even if most of the received bits are wrong, the received code is likely to match most closely to the code sent by the transmitter rather than one of the other codes. Thus the receiver can determine what the transmitter sent even if the background noise is much higher than the received radio signal.
Some other advantages of using long codes is that the codes inherently correct physical bit errors at the receiver. Also different transmitters can each use different code sets so they can talk at the same time (this approach is how CDMA cell phones work).
$endgroup$
$begingroup$
Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
$endgroup$
– a CVn♦
9 hours ago
add a comment |
$begingroup$
- Encode whole words instead of single letters.
- Use Huffmann encoding based on word frequency in the specific context of space travel. So that frequent words ('the', 'yes', 'shields') have less bits than less frequent words.
- Use markov chains to take the context of the sentence into account as well.
$endgroup$
add a comment |
$begingroup$
Oops! When I was writing this, I forgot you said "5 bits per hour" and was thinking "5 bits per day"... read all instances of "day" below a "hour". I'm leaving this message temporarily in case I missed any instances.
Here is the literal answer to your question:
Use base64 character encoding. This allows you to represent the English characters, including numbers, which is precisely what you said you wanted, using 6 bits per character which is just 1 bit short of fitting into 1 hour's worth of transmission in your circumstance.
And here are some enhancements to that by adding special "modes"...
This includes both upper and lower case letters. If you are fine with restricting yourself to one case, which would still fulfill your requirements, then you would have room left to include more punctuation or other enhancements (up to 26 other enhanced transmission modes). I would recommend using some of this extra space to represent some extremely common words or short phrases that you would use very, very often. Then use a few of the character slots for other special meanings, such as "the next few bytes represent status codes" or "the following data is compressed".
Mode 1: Table of most common words or phrases
For the examples below, I'll assume that 2 of the characters represent different word/phrase lookup tables using 9 bits each since this allows the lookup to take exactly 3 hours to send, including the initial 6 bits (6 + 9 = 15 bits = 3 hours). This allows for 512 bits worth of lookup power times 2; that is, 1024 different shortened words or phrases.
Using this format...
"Hey Bob" as plain text requires 6*7=42 bits = 8 hours + 2 bits
"communication array damaged by [reason]", assuming "communication array" and "damaged" are both in the lookup table, would take 9 hour + 3 bits + [however long it takes to send the reason]. "communication array damaged by Klingon torp" would take 24 hours - less if either "Klingon" or "torp" were send as lookup words instead of as plain text.
Mode 2: Look-back
This is a "repeat recent word" mode. In computer science, it has been shown that recently used data is among the most likely data to be used next, and that is what a PC memory cache is for. We can do something similar by making 1 of the character slots represent "The next 4 bits refer to a previous word; count back that many words in the most recent transmitted data."
With this, "Klingon fleet approaches from 294 and Klingon admiral on comms saying Klingon destroyers equipped with new black hole tech" allows you to shorten 2 instances of "Klingon" to exactly 2 hours worth of data each; the first one providing "0110" (6) as the 4-bit-lookback value and the second instance being "0101" (5). In some communications this could save a lot of time if words are repeated often. Note that pronouns like "their" could have been used in some places, but that would take 7 plaintext characters (including surrounding spaces), which in this case would have taken 6 hours + 2 bits longer to send.
Mode 3: Copy/paste, possibly with separate paste-buffers
This would allow customized shortcuts that were not thought of before launch, a sort of copy/paste. "start copying here", then continue the message, then a "stop copying here"... then later you can send a "paste" character to repeat a long message. "comms good, thrusters good, life support good, magnetic-artificial-gravity [copy]working intermittently due to a swarm of flies in the grav-capacitor[end copy] from a meal someone left out for days", then you only have to send that long text 1 time, and each time after that you send "comms good, thrusters good, life support good, magnetic-artificial-gravity [paste]", and you do that until it changes back to "good". Also, "comms", "thrusters", "life support", "magnetic-artificial-gravity", and "good" might all be in the lookup table, meaning this entire message takes 22 hours + 1 bit to send after the first time you send it. Even better is if you make this "paste mode" be followed by a few bits for a "paste buffer number". Then you could "[copy1]comms good, [etc.], [copy2]working intermittently due to...[end-copy-2][end-copy-1] from a meal that..." Then every time you want to send an updated status, if it's the same as the one before it takes 1 hour plus a few bits.
You can tweak the exact representation (different send modes, number of bits for each, etc.) to improve performance based on your expected communications to improve performance further.
Mode 4: Compression
Just what it sounds like. The following data uses a given compression algorithm.
Multiple versions of each mode
If you cut out lower case and still have more character slots left over after implementing all the punctuation and mode's you want, you can use left over character slots to expand modes that you already have. This is similar to what I did above with the lookup tables, I suggested using 2 of the base64 character slots that were freed up from tossing lower-case to give us 2 separate lookup tables. You could also do similar to double your look-back reach or to double your number of copy/paste buffers. You can also increase or decrease the number of bits following a mode-byte, such as having 4 bits after copy/paste to have 16 paste buffers, or only 2 extra bits to save on transmission time but allow only 4 paste buffers.
So how efficient is this?
Worst case scenario this requires 6 bits per character. Average case scenario you will use a few lookups, look-backs, or some compression to beat the worst case, so you require 3-5 bits per character. Best case is messages that can be relayed entirely, or nearly entirely, by lookups and look-backs, which will be often for normal day to day activities that go as expected - for such common communication, if you have a well tuned number of bits for each special mode, you should achieve better than 1 bit per character. Many times much, much better than 1 bit per character, such as with the status report example a couple paragraphs up.
$endgroup$
$begingroup$
Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed byzlib
by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
$endgroup$
– Henning Makholm
5 hours ago
add a comment |
$begingroup$
Huffman Encoding
Basically, you want the same methodology we use today for writing to a .zip file. Basically what happens is we take the most common character in the file (probably 'e'), and say that it simply corresponds to the bit '1'. Then the next most common one ('a' maybe?) will be '01', and the next most common (let's say 't') will be '001'.
So, given this system, "eat" = "101001", while "tea" = "001101".
This is the most efficient form of encoding there is, as it gives you access to any number of characters while still using very few bits for the vast majority of the ones you're using.
Note though: this is most effective when some letters/characters are used far more than other ones (as it is in modern English).
Also, most .zip files will send along a "dictionary" of bit combinations and characters, so the other person can translate out of it. This can be wasteful to send every time, especially for short messages. However, if every user has a well-known dictionary that is encoded to best represent common English usages it can work.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "579"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworldbuilding.stackexchange.com%2fquestions%2f144224%2fmost-bit-efficient-text-communication-method%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
10 Answers
10
active
oldest
votes
10 Answers
10
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The most efficient communication is probably a command set. Since you contemplated Morse code, I assume that the communication is done via a fully defined interface - both sender and receiver know what a bit sequence is supposed to mean.
A command set is no more that giving different codes predefined meanings. With one singe bit you can define 2 commands:
| value | meaning |
| 0 | light off |
| 1 | light on |
With 4 bits you can define 15 different commands, with 1 byte (8 bits) 255 commands, with 2 bytes 65535 commands and so on. If all you really need is to display texts to an astronaut, you have to store a bunch of ready made texts like "Activate X-ray sensors" in a database and send the corresponding message ID from Earth. For more complex messages you can store text templates in a database and then compile a message from several templates.
An early real-world example is the list of Q-Codes, created circa 1909, by the British government as a "list of abbreviations... prepared for the use of British ships and coast stations licensed by the Postmaster General".
If you need to communicate more than simple texts, you would separate a message into a command part and a message part. You could, for example, tell the space ship:
Activate X-ray sensors
By sending a signal of 2 bytes:
| byte | value | meaning |
| 1 | 01 | activate appliance |
| 2 | 08 | X-ray sensor array |
Communication with an astronaut would be possible with a different command:
| byte | value | meaning |
| 1 | 04 | write to terminal |
| 2 | 08 | text with ID 8 |
That would result in slightly longer commands, but the possibilities of what you can achieve with a few bytes are multiplied.
If you have a really big database with a whole lot of different texts, it might be more efficient to terminate commands with a defined code. For this approach, the database must be sorted in a way that gives the most frequent commands the lowest ID.
Let's define 0000
as the terminator.
- For a very common command with the ID 6, you need to send the command's ID followed by the terminator:
0110 0000
. - A very uncommon command with the ID 26683 would look like this:
0110 1000 0011 1011 0000
.
The advantage is that you can have commands of dynamic lengths (instead of sending a whole bunch of useless 0's to fill up the static length of a command).
The disadvantage is that every command is longer than it could ideally be. So this approach only gets worthwhile when you need a great many commands.
After defining your command set, the next step is to make sure that you received the correct message. Losing just a single bit can change a message of "Activate X-ray sensors" into "Destroy X-ray sensors" or similar. This is usually done with a checksum, which requires some more bits to transmit.
Have a look at the difference between two common data transmission protocols for the internet: UDP and TCP. UDP is the most efficient in respect to transfer rate, but TCP trades some efficiency for reliability by including some overhead for error checking.
$endgroup$
6
$begingroup$
+1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
$endgroup$
– Joe Bloggs
15 hours ago
19
$begingroup$
So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
$endgroup$
– Chronocidal
15 hours ago
8
$begingroup$
1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
$endgroup$
– genesis
15 hours ago
$begingroup$
You could take AS Interface as a reference. A small scale industrial communication bus system
$endgroup$
– Alexander von Wernherr
14 hours ago
2
$begingroup$
If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
$endgroup$
– FluxIX
12 hours ago
|
show 6 more comments
$begingroup$
The most efficient communication is probably a command set. Since you contemplated Morse code, I assume that the communication is done via a fully defined interface - both sender and receiver know what a bit sequence is supposed to mean.
A command set is no more that giving different codes predefined meanings. With one singe bit you can define 2 commands:
| value | meaning |
| 0 | light off |
| 1 | light on |
With 4 bits you can define 15 different commands, with 1 byte (8 bits) 255 commands, with 2 bytes 65535 commands and so on. If all you really need is to display texts to an astronaut, you have to store a bunch of ready made texts like "Activate X-ray sensors" in a database and send the corresponding message ID from Earth. For more complex messages you can store text templates in a database and then compile a message from several templates.
An early real-world example is the list of Q-Codes, created circa 1909, by the British government as a "list of abbreviations... prepared for the use of British ships and coast stations licensed by the Postmaster General".
If you need to communicate more than simple texts, you would separate a message into a command part and a message part. You could, for example, tell the space ship:
Activate X-ray sensors
By sending a signal of 2 bytes:
| byte | value | meaning |
| 1 | 01 | activate appliance |
| 2 | 08 | X-ray sensor array |
Communication with an astronaut would be possible with a different command:
| byte | value | meaning |
| 1 | 04 | write to terminal |
| 2 | 08 | text with ID 8 |
That would result in slightly longer commands, but the possibilities of what you can achieve with a few bytes are multiplied.
If you have a really big database with a whole lot of different texts, it might be more efficient to terminate commands with a defined code. For this approach, the database must be sorted in a way that gives the most frequent commands the lowest ID.
Let's define 0000
as the terminator.
- For a very common command with the ID 6, you need to send the command's ID followed by the terminator:
0110 0000
. - A very uncommon command with the ID 26683 would look like this:
0110 1000 0011 1011 0000
.
The advantage is that you can have commands of dynamic lengths (instead of sending a whole bunch of useless 0's to fill up the static length of a command).
The disadvantage is that every command is longer than it could ideally be. So this approach only gets worthwhile when you need a great many commands.
After defining your command set, the next step is to make sure that you received the correct message. Losing just a single bit can change a message of "Activate X-ray sensors" into "Destroy X-ray sensors" or similar. This is usually done with a checksum, which requires some more bits to transmit.
Have a look at the difference between two common data transmission protocols for the internet: UDP and TCP. UDP is the most efficient in respect to transfer rate, but TCP trades some efficiency for reliability by including some overhead for error checking.
$endgroup$
6
$begingroup$
+1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
$endgroup$
– Joe Bloggs
15 hours ago
19
$begingroup$
So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
$endgroup$
– Chronocidal
15 hours ago
8
$begingroup$
1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
$endgroup$
– genesis
15 hours ago
$begingroup$
You could take AS Interface as a reference. A small scale industrial communication bus system
$endgroup$
– Alexander von Wernherr
14 hours ago
2
$begingroup$
If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
$endgroup$
– FluxIX
12 hours ago
|
show 6 more comments
$begingroup$
The most efficient communication is probably a command set. Since you contemplated Morse code, I assume that the communication is done via a fully defined interface - both sender and receiver know what a bit sequence is supposed to mean.
A command set is no more that giving different codes predefined meanings. With one singe bit you can define 2 commands:
| value | meaning |
| 0 | light off |
| 1 | light on |
With 4 bits you can define 15 different commands, with 1 byte (8 bits) 255 commands, with 2 bytes 65535 commands and so on. If all you really need is to display texts to an astronaut, you have to store a bunch of ready made texts like "Activate X-ray sensors" in a database and send the corresponding message ID from Earth. For more complex messages you can store text templates in a database and then compile a message from several templates.
An early real-world example is the list of Q-Codes, created circa 1909, by the British government as a "list of abbreviations... prepared for the use of British ships and coast stations licensed by the Postmaster General".
If you need to communicate more than simple texts, you would separate a message into a command part and a message part. You could, for example, tell the space ship:
Activate X-ray sensors
By sending a signal of 2 bytes:
| byte | value | meaning |
| 1 | 01 | activate appliance |
| 2 | 08 | X-ray sensor array |
Communication with an astronaut would be possible with a different command:
| byte | value | meaning |
| 1 | 04 | write to terminal |
| 2 | 08 | text with ID 8 |
That would result in slightly longer commands, but the possibilities of what you can achieve with a few bytes are multiplied.
If you have a really big database with a whole lot of different texts, it might be more efficient to terminate commands with a defined code. For this approach, the database must be sorted in a way that gives the most frequent commands the lowest ID.
Let's define 0000
as the terminator.
- For a very common command with the ID 6, you need to send the command's ID followed by the terminator:
0110 0000
. - A very uncommon command with the ID 26683 would look like this:
0110 1000 0011 1011 0000
.
The advantage is that you can have commands of dynamic lengths (instead of sending a whole bunch of useless 0's to fill up the static length of a command).
The disadvantage is that every command is longer than it could ideally be. So this approach only gets worthwhile when you need a great many commands.
After defining your command set, the next step is to make sure that you received the correct message. Losing just a single bit can change a message of "Activate X-ray sensors" into "Destroy X-ray sensors" or similar. This is usually done with a checksum, which requires some more bits to transmit.
Have a look at the difference between two common data transmission protocols for the internet: UDP and TCP. UDP is the most efficient in respect to transfer rate, but TCP trades some efficiency for reliability by including some overhead for error checking.
$endgroup$
The most efficient communication is probably a command set. Since you contemplated Morse code, I assume that the communication is done via a fully defined interface - both sender and receiver know what a bit sequence is supposed to mean.
A command set is no more that giving different codes predefined meanings. With one singe bit you can define 2 commands:
| value | meaning |
| 0 | light off |
| 1 | light on |
With 4 bits you can define 15 different commands, with 1 byte (8 bits) 255 commands, with 2 bytes 65535 commands and so on. If all you really need is to display texts to an astronaut, you have to store a bunch of ready made texts like "Activate X-ray sensors" in a database and send the corresponding message ID from Earth. For more complex messages you can store text templates in a database and then compile a message from several templates.
An early real-world example is the list of Q-Codes, created circa 1909, by the British government as a "list of abbreviations... prepared for the use of British ships and coast stations licensed by the Postmaster General".
If you need to communicate more than simple texts, you would separate a message into a command part and a message part. You could, for example, tell the space ship:
Activate X-ray sensors
By sending a signal of 2 bytes:
| byte | value | meaning |
| 1 | 01 | activate appliance |
| 2 | 08 | X-ray sensor array |
Communication with an astronaut would be possible with a different command:
| byte | value | meaning |
| 1 | 04 | write to terminal |
| 2 | 08 | text with ID 8 |
That would result in slightly longer commands, but the possibilities of what you can achieve with a few bytes are multiplied.
If you have a really big database with a whole lot of different texts, it might be more efficient to terminate commands with a defined code. For this approach, the database must be sorted in a way that gives the most frequent commands the lowest ID.
Let's define 0000
as the terminator.
- For a very common command with the ID 6, you need to send the command's ID followed by the terminator:
0110 0000
. - A very uncommon command with the ID 26683 would look like this:
0110 1000 0011 1011 0000
.
The advantage is that you can have commands of dynamic lengths (instead of sending a whole bunch of useless 0's to fill up the static length of a command).
The disadvantage is that every command is longer than it could ideally be. So this approach only gets worthwhile when you need a great many commands.
After defining your command set, the next step is to make sure that you received the correct message. Losing just a single bit can change a message of "Activate X-ray sensors" into "Destroy X-ray sensors" or similar. This is usually done with a checksum, which requires some more bits to transmit.
Have a look at the difference between two common data transmission protocols for the internet: UDP and TCP. UDP is the most efficient in respect to transfer rate, but TCP trades some efficiency for reliability by including some overhead for error checking.
edited 9 hours ago
Brythan
21k74286
21k74286
answered 16 hours ago
ElmyElmy
13.4k22464
13.4k22464
6
$begingroup$
+1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
$endgroup$
– Joe Bloggs
15 hours ago
19
$begingroup$
So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
$endgroup$
– Chronocidal
15 hours ago
8
$begingroup$
1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
$endgroup$
– genesis
15 hours ago
$begingroup$
You could take AS Interface as a reference. A small scale industrial communication bus system
$endgroup$
– Alexander von Wernherr
14 hours ago
2
$begingroup$
If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
$endgroup$
– FluxIX
12 hours ago
|
show 6 more comments
6
$begingroup$
+1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
$endgroup$
– Joe Bloggs
15 hours ago
19
$begingroup$
So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
$endgroup$
– Chronocidal
15 hours ago
8
$begingroup$
1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
$endgroup$
– genesis
15 hours ago
$begingroup$
You could take AS Interface as a reference. A small scale industrial communication bus system
$endgroup$
– Alexander von Wernherr
14 hours ago
2
$begingroup$
If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
$endgroup$
– FluxIX
12 hours ago
6
6
$begingroup$
+1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
$endgroup$
– Joe Bloggs
15 hours ago
$begingroup$
+1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
$endgroup$
– Joe Bloggs
15 hours ago
19
19
$begingroup$
So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
$endgroup$
– Chronocidal
15 hours ago
$begingroup$
So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
$endgroup$
– Chronocidal
15 hours ago
8
8
$begingroup$
1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
$endgroup$
– genesis
15 hours ago
$begingroup$
1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
$endgroup$
– genesis
15 hours ago
$begingroup$
You could take AS Interface as a reference. A small scale industrial communication bus system
$endgroup$
– Alexander von Wernherr
14 hours ago
$begingroup$
You could take AS Interface as a reference. A small scale industrial communication bus system
$endgroup$
– Alexander von Wernherr
14 hours ago
2
2
$begingroup$
If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
$endgroup$
– FluxIX
12 hours ago
$begingroup$
If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
$endgroup$
– FluxIX
12 hours ago
|
show 6 more comments
$begingroup$
According to Schneier the entropy of English text is below 1.6 bit per letter. Given a difficult constraint such as yours I would expect people to come up with compression algorithms getting close to that.
If you don't need the full power of English you might get much better compression if you can pre-define a small set of words that would be sufficient. Something similar in principle to https://xkcd.com/1133/
I think you need to answer two important questions:
- Is the system pre-defined, i.e. can there be word-lists?
- Are characters/words sent individually or can you apply compression to a large amount of data and then send it in bulk?
If you want something that is simple, sciency and requires no setup, go with Huffman-coding individual letters based on frequency in English. ;)
$endgroup$
1
$begingroup$
as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
$endgroup$
– Andrey
6 hours ago
$begingroup$
@Andrey I'm not sure I would call rocket science a "simple idea".
$endgroup$
– TheHansinator
3 hours ago
1
$begingroup$
@Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
$endgroup$
– TheHansinator
3 hours ago
add a comment |
$begingroup$
According to Schneier the entropy of English text is below 1.6 bit per letter. Given a difficult constraint such as yours I would expect people to come up with compression algorithms getting close to that.
If you don't need the full power of English you might get much better compression if you can pre-define a small set of words that would be sufficient. Something similar in principle to https://xkcd.com/1133/
I think you need to answer two important questions:
- Is the system pre-defined, i.e. can there be word-lists?
- Are characters/words sent individually or can you apply compression to a large amount of data and then send it in bulk?
If you want something that is simple, sciency and requires no setup, go with Huffman-coding individual letters based on frequency in English. ;)
$endgroup$
1
$begingroup$
as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
$endgroup$
– Andrey
6 hours ago
$begingroup$
@Andrey I'm not sure I would call rocket science a "simple idea".
$endgroup$
– TheHansinator
3 hours ago
1
$begingroup$
@Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
$endgroup$
– TheHansinator
3 hours ago
add a comment |
$begingroup$
According to Schneier the entropy of English text is below 1.6 bit per letter. Given a difficult constraint such as yours I would expect people to come up with compression algorithms getting close to that.
If you don't need the full power of English you might get much better compression if you can pre-define a small set of words that would be sufficient. Something similar in principle to https://xkcd.com/1133/
I think you need to answer two important questions:
- Is the system pre-defined, i.e. can there be word-lists?
- Are characters/words sent individually or can you apply compression to a large amount of data and then send it in bulk?
If you want something that is simple, sciency and requires no setup, go with Huffman-coding individual letters based on frequency in English. ;)
$endgroup$
According to Schneier the entropy of English text is below 1.6 bit per letter. Given a difficult constraint such as yours I would expect people to come up with compression algorithms getting close to that.
If you don't need the full power of English you might get much better compression if you can pre-define a small set of words that would be sufficient. Something similar in principle to https://xkcd.com/1133/
I think you need to answer two important questions:
- Is the system pre-defined, i.e. can there be word-lists?
- Are characters/words sent individually or can you apply compression to a large amount of data and then send it in bulk?
If you want something that is simple, sciency and requires no setup, go with Huffman-coding individual letters based on frequency in English. ;)
edited 16 hours ago
answered 16 hours ago
genesisgenesis
59217
59217
1
$begingroup$
as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
$endgroup$
– Andrey
6 hours ago
$begingroup$
@Andrey I'm not sure I would call rocket science a "simple idea".
$endgroup$
– TheHansinator
3 hours ago
1
$begingroup$
@Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
$endgroup$
– TheHansinator
3 hours ago
add a comment |
1
$begingroup$
as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
$endgroup$
– Andrey
6 hours ago
$begingroup$
@Andrey I'm not sure I would call rocket science a "simple idea".
$endgroup$
– TheHansinator
3 hours ago
1
$begingroup$
@Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
$endgroup$
– TheHansinator
3 hours ago
1
1
$begingroup$
as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
$endgroup$
– Andrey
6 hours ago
$begingroup$
as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
$endgroup$
– Andrey
6 hours ago
$begingroup$
@Andrey I'm not sure I would call rocket science a "simple idea".
$endgroup$
– TheHansinator
3 hours ago
$begingroup$
@Andrey I'm not sure I would call rocket science a "simple idea".
$endgroup$
– TheHansinator
3 hours ago
1
1
$begingroup$
@Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
$endgroup$
– TheHansinator
3 hours ago
$begingroup$
@Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
$endgroup$
– TheHansinator
3 hours ago
add a comment |
$begingroup$
You might look at digital modes for amateur radio here. Some of those modes use what's called "varicode" -- where different characters have different symbol lengths (Morse code is a varicode system -- more commonly used letters are shorter in terms of transmission time). When sending English text, a varicode will minimize the number of bits required for a sufficiently large sample (which reasonably ought to include a large number of messages). If "text speak" is used commonly, it might make sense to design the varicode used around letter frequencies in that particular text format.
If longer messages are common, some form of compression would make sense -- text typically compresses will with common compression algorithms, but the compression headers make this inefficient for very small blocks of data (text or otherwise).
$endgroup$
add a comment |
$begingroup$
You might look at digital modes for amateur radio here. Some of those modes use what's called "varicode" -- where different characters have different symbol lengths (Morse code is a varicode system -- more commonly used letters are shorter in terms of transmission time). When sending English text, a varicode will minimize the number of bits required for a sufficiently large sample (which reasonably ought to include a large number of messages). If "text speak" is used commonly, it might make sense to design the varicode used around letter frequencies in that particular text format.
If longer messages are common, some form of compression would make sense -- text typically compresses will with common compression algorithms, but the compression headers make this inefficient for very small blocks of data (text or otherwise).
$endgroup$
add a comment |
$begingroup$
You might look at digital modes for amateur radio here. Some of those modes use what's called "varicode" -- where different characters have different symbol lengths (Morse code is a varicode system -- more commonly used letters are shorter in terms of transmission time). When sending English text, a varicode will minimize the number of bits required for a sufficiently large sample (which reasonably ought to include a large number of messages). If "text speak" is used commonly, it might make sense to design the varicode used around letter frequencies in that particular text format.
If longer messages are common, some form of compression would make sense -- text typically compresses will with common compression algorithms, but the compression headers make this inefficient for very small blocks of data (text or otherwise).
$endgroup$
You might look at digital modes for amateur radio here. Some of those modes use what's called "varicode" -- where different characters have different symbol lengths (Morse code is a varicode system -- more commonly used letters are shorter in terms of transmission time). When sending English text, a varicode will minimize the number of bits required for a sufficiently large sample (which reasonably ought to include a large number of messages). If "text speak" is used commonly, it might make sense to design the varicode used around letter frequencies in that particular text format.
If longer messages are common, some form of compression would make sense -- text typically compresses will with common compression algorithms, but the compression headers make this inefficient for very small blocks of data (text or otherwise).
answered 15 hours ago
Zeiss IkonZeiss Ikon
2,584117
2,584117
add a comment |
add a comment |
$begingroup$
Textspeak
SMS messages originally were 160 characters so textspeak evolved to reduce everything down to the most compact form through abbreviations, acronyms and emoticons.
Sounds like a good reason to send teenagers into space....
$endgroup$
$begingroup$
I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
$endgroup$
– Aaron
8 hours ago
$begingroup$
Was previously beeper speak: 143 133 43 43 5318008
$endgroup$
– RIanGillis
6 hours ago
add a comment |
$begingroup$
Textspeak
SMS messages originally were 160 characters so textspeak evolved to reduce everything down to the most compact form through abbreviations, acronyms and emoticons.
Sounds like a good reason to send teenagers into space....
$endgroup$
$begingroup$
I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
$endgroup$
– Aaron
8 hours ago
$begingroup$
Was previously beeper speak: 143 133 43 43 5318008
$endgroup$
– RIanGillis
6 hours ago
add a comment |
$begingroup$
Textspeak
SMS messages originally were 160 characters so textspeak evolved to reduce everything down to the most compact form through abbreviations, acronyms and emoticons.
Sounds like a good reason to send teenagers into space....
$endgroup$
Textspeak
SMS messages originally were 160 characters so textspeak evolved to reduce everything down to the most compact form through abbreviations, acronyms and emoticons.
Sounds like a good reason to send teenagers into space....
edited 13 hours ago
Glorfindel
4151614
4151614
answered 16 hours ago
ThorneThorne
18.5k42657
18.5k42657
$begingroup$
I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
$endgroup$
– Aaron
8 hours ago
$begingroup$
Was previously beeper speak: 143 133 43 43 5318008
$endgroup$
– RIanGillis
6 hours ago
add a comment |
$begingroup$
I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
$endgroup$
– Aaron
8 hours ago
$begingroup$
Was previously beeper speak: 143 133 43 43 5318008
$endgroup$
– RIanGillis
6 hours ago
$begingroup$
I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
$endgroup$
– Aaron
8 hours ago
$begingroup$
I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
$endgroup$
– Aaron
8 hours ago
$begingroup$
Was previously beeper speak: 143 133 43 43 5318008
$endgroup$
– RIanGillis
6 hours ago
$begingroup$
Was previously beeper speak: 143 133 43 43 5318008
$endgroup$
– RIanGillis
6 hours ago
add a comment |
$begingroup$
Building on other answers
In addition to the different encoding and compression methods, one thing to look into is shorthand techniques that allow you to drop letters while still being able to interpret the message. Some examples:
- it, to -> t
- is -> s
- have -> hv
- cat -> ct
- are -> r
Example sentence: hw r u?
An alternative approach
Encode your information in time delays
Presumably there is some reason that you can't speed up the data transmission, but perhaps you can slow it down. At 5 bits per hour, that's 12 minutes between each bit. Instead of sending each bit at regular interval, you can delay transmission of bits and use the delay time as a means of conveying information.
So let's say you expect a minimum of 12 minutes between each bit, you can encode the data as follows (time is in mm:ss format):
- 12:00 = 0
- 12:05 = 1
- 12:10 = 2
- 12:15 = 3
- etc
The more data you encode, the fewer bits per day you'll be able to transmit, so there will be some optimal balance you'll have to figure out based on the minimum delay interval you consider acceptable. Then you can perhaps use the bits themselves as an error checking mechanism, or to still transmit data.
$endgroup$
add a comment |
$begingroup$
Building on other answers
In addition to the different encoding and compression methods, one thing to look into is shorthand techniques that allow you to drop letters while still being able to interpret the message. Some examples:
- it, to -> t
- is -> s
- have -> hv
- cat -> ct
- are -> r
Example sentence: hw r u?
An alternative approach
Encode your information in time delays
Presumably there is some reason that you can't speed up the data transmission, but perhaps you can slow it down. At 5 bits per hour, that's 12 minutes between each bit. Instead of sending each bit at regular interval, you can delay transmission of bits and use the delay time as a means of conveying information.
So let's say you expect a minimum of 12 minutes between each bit, you can encode the data as follows (time is in mm:ss format):
- 12:00 = 0
- 12:05 = 1
- 12:10 = 2
- 12:15 = 3
- etc
The more data you encode, the fewer bits per day you'll be able to transmit, so there will be some optimal balance you'll have to figure out based on the minimum delay interval you consider acceptable. Then you can perhaps use the bits themselves as an error checking mechanism, or to still transmit data.
$endgroup$
add a comment |
$begingroup$
Building on other answers
In addition to the different encoding and compression methods, one thing to look into is shorthand techniques that allow you to drop letters while still being able to interpret the message. Some examples:
- it, to -> t
- is -> s
- have -> hv
- cat -> ct
- are -> r
Example sentence: hw r u?
An alternative approach
Encode your information in time delays
Presumably there is some reason that you can't speed up the data transmission, but perhaps you can slow it down. At 5 bits per hour, that's 12 minutes between each bit. Instead of sending each bit at regular interval, you can delay transmission of bits and use the delay time as a means of conveying information.
So let's say you expect a minimum of 12 minutes between each bit, you can encode the data as follows (time is in mm:ss format):
- 12:00 = 0
- 12:05 = 1
- 12:10 = 2
- 12:15 = 3
- etc
The more data you encode, the fewer bits per day you'll be able to transmit, so there will be some optimal balance you'll have to figure out based on the minimum delay interval you consider acceptable. Then you can perhaps use the bits themselves as an error checking mechanism, or to still transmit data.
$endgroup$
Building on other answers
In addition to the different encoding and compression methods, one thing to look into is shorthand techniques that allow you to drop letters while still being able to interpret the message. Some examples:
- it, to -> t
- is -> s
- have -> hv
- cat -> ct
- are -> r
Example sentence: hw r u?
An alternative approach
Encode your information in time delays
Presumably there is some reason that you can't speed up the data transmission, but perhaps you can slow it down. At 5 bits per hour, that's 12 minutes between each bit. Instead of sending each bit at regular interval, you can delay transmission of bits and use the delay time as a means of conveying information.
So let's say you expect a minimum of 12 minutes between each bit, you can encode the data as follows (time is in mm:ss format):
- 12:00 = 0
- 12:05 = 1
- 12:10 = 2
- 12:15 = 3
- etc
The more data you encode, the fewer bits per day you'll be able to transmit, so there will be some optimal balance you'll have to figure out based on the minimum delay interval you consider acceptable. Then you can perhaps use the bits themselves as an error checking mechanism, or to still transmit data.
answered 9 hours ago
anjamaanjama
911
911
add a comment |
add a comment |
$begingroup$
Not Morse code
From Wikipedia:
International Morse code is composed of five elements:[1]
- short mark, dot or "dit" (▄▄▄▄): "dot duration" is one time unit long
- longer mark, dash or "dah" (▄▄▄▄▄▄): three time units long
- inter-element gap between the dots and dashes within a character: one dot duration or one unit long
- short gap (between letters): three time units long
- medium gap (between words): seven time units long
If we use one bit to store one unit of information, it takes four bits to transmit even the shortest letter ('e') and its subsequent gap. The next shortest are 'i' and 't' at six bits. Then 'a', 'n', and 's' at eight. The longest character in the Morse alphabet is 0, which requires five dashes or twenty-two units/bits. And that only supports the thirty-six character latin alphanumeric alphabet.
Morse is designed around humans. Humans do better with indeterminate length than fixed length, as we don't have good timing ability (we can't tell a five unit pause from a four unit pause consistently). But if these messages are being transmitted computer to computer, computers have great ability at timing. We can use superior fixed length formats. Heck, even with humans, twelve minute long units means that it is easy to track whether you're getting a pause or a dot (a zero or a one).
Even worse, if you are transmitting Morse over bits. Because (extended) ASCII's eight bits is more efficient unless the message is composed entirely of 'eitans'.
Bits
Meanwhile, if we transmit ASCII, we could transmit a 0 with eight bits. If we break things into nybbles, we can transmit one nybble with a checksum bit every hour. So two hours to transmit one character with some error detection included. Or ninety-six minutes without the checksums.
If we instead use ten bits (two hours), we can do something like Lempel-Ziv. So the first 256 characters are the extended ASCII set. The remaining 768 symbols actually represent multiple characters. So common sequences (e.g. "the ", "ing", and "tion") would have their own ten-bit representation, e.g. 0100000000. This allows the full flexibility of ASCII while also producing a shorter message on average.
The Lempel-Ziv algorithm builds the dictionary from the message itself. We can do better by agreeing on a dictionary beforehand. You can also use this to integrate the error correction and the dictionary, which improves your effective speed.
Numbers are generally going to be better sent as bits than as characters. I.e. instead of sending ASCII 3840, just send 111100000000. That's only twelve bits, hardly more than a single character.
$endgroup$
add a comment |
$begingroup$
Not Morse code
From Wikipedia:
International Morse code is composed of five elements:[1]
- short mark, dot or "dit" (▄▄▄▄): "dot duration" is one time unit long
- longer mark, dash or "dah" (▄▄▄▄▄▄): three time units long
- inter-element gap between the dots and dashes within a character: one dot duration or one unit long
- short gap (between letters): three time units long
- medium gap (between words): seven time units long
If we use one bit to store one unit of information, it takes four bits to transmit even the shortest letter ('e') and its subsequent gap. The next shortest are 'i' and 't' at six bits. Then 'a', 'n', and 's' at eight. The longest character in the Morse alphabet is 0, which requires five dashes or twenty-two units/bits. And that only supports the thirty-six character latin alphanumeric alphabet.
Morse is designed around humans. Humans do better with indeterminate length than fixed length, as we don't have good timing ability (we can't tell a five unit pause from a four unit pause consistently). But if these messages are being transmitted computer to computer, computers have great ability at timing. We can use superior fixed length formats. Heck, even with humans, twelve minute long units means that it is easy to track whether you're getting a pause or a dot (a zero or a one).
Even worse, if you are transmitting Morse over bits. Because (extended) ASCII's eight bits is more efficient unless the message is composed entirely of 'eitans'.
Bits
Meanwhile, if we transmit ASCII, we could transmit a 0 with eight bits. If we break things into nybbles, we can transmit one nybble with a checksum bit every hour. So two hours to transmit one character with some error detection included. Or ninety-six minutes without the checksums.
If we instead use ten bits (two hours), we can do something like Lempel-Ziv. So the first 256 characters are the extended ASCII set. The remaining 768 symbols actually represent multiple characters. So common sequences (e.g. "the ", "ing", and "tion") would have their own ten-bit representation, e.g. 0100000000. This allows the full flexibility of ASCII while also producing a shorter message on average.
The Lempel-Ziv algorithm builds the dictionary from the message itself. We can do better by agreeing on a dictionary beforehand. You can also use this to integrate the error correction and the dictionary, which improves your effective speed.
Numbers are generally going to be better sent as bits than as characters. I.e. instead of sending ASCII 3840, just send 111100000000. That's only twelve bits, hardly more than a single character.
$endgroup$
add a comment |
$begingroup$
Not Morse code
From Wikipedia:
International Morse code is composed of five elements:[1]
- short mark, dot or "dit" (▄▄▄▄): "dot duration" is one time unit long
- longer mark, dash or "dah" (▄▄▄▄▄▄): three time units long
- inter-element gap between the dots and dashes within a character: one dot duration or one unit long
- short gap (between letters): three time units long
- medium gap (between words): seven time units long
If we use one bit to store one unit of information, it takes four bits to transmit even the shortest letter ('e') and its subsequent gap. The next shortest are 'i' and 't' at six bits. Then 'a', 'n', and 's' at eight. The longest character in the Morse alphabet is 0, which requires five dashes or twenty-two units/bits. And that only supports the thirty-six character latin alphanumeric alphabet.
Morse is designed around humans. Humans do better with indeterminate length than fixed length, as we don't have good timing ability (we can't tell a five unit pause from a four unit pause consistently). But if these messages are being transmitted computer to computer, computers have great ability at timing. We can use superior fixed length formats. Heck, even with humans, twelve minute long units means that it is easy to track whether you're getting a pause or a dot (a zero or a one).
Even worse, if you are transmitting Morse over bits. Because (extended) ASCII's eight bits is more efficient unless the message is composed entirely of 'eitans'.
Bits
Meanwhile, if we transmit ASCII, we could transmit a 0 with eight bits. If we break things into nybbles, we can transmit one nybble with a checksum bit every hour. So two hours to transmit one character with some error detection included. Or ninety-six minutes without the checksums.
If we instead use ten bits (two hours), we can do something like Lempel-Ziv. So the first 256 characters are the extended ASCII set. The remaining 768 symbols actually represent multiple characters. So common sequences (e.g. "the ", "ing", and "tion") would have their own ten-bit representation, e.g. 0100000000. This allows the full flexibility of ASCII while also producing a shorter message on average.
The Lempel-Ziv algorithm builds the dictionary from the message itself. We can do better by agreeing on a dictionary beforehand. You can also use this to integrate the error correction and the dictionary, which improves your effective speed.
Numbers are generally going to be better sent as bits than as characters. I.e. instead of sending ASCII 3840, just send 111100000000. That's only twelve bits, hardly more than a single character.
$endgroup$
Not Morse code
From Wikipedia:
International Morse code is composed of five elements:[1]
- short mark, dot or "dit" (▄▄▄▄): "dot duration" is one time unit long
- longer mark, dash or "dah" (▄▄▄▄▄▄): three time units long
- inter-element gap between the dots and dashes within a character: one dot duration or one unit long
- short gap (between letters): three time units long
- medium gap (between words): seven time units long
If we use one bit to store one unit of information, it takes four bits to transmit even the shortest letter ('e') and its subsequent gap. The next shortest are 'i' and 't' at six bits. Then 'a', 'n', and 's' at eight. The longest character in the Morse alphabet is 0, which requires five dashes or twenty-two units/bits. And that only supports the thirty-six character latin alphanumeric alphabet.
Morse is designed around humans. Humans do better with indeterminate length than fixed length, as we don't have good timing ability (we can't tell a five unit pause from a four unit pause consistently). But if these messages are being transmitted computer to computer, computers have great ability at timing. We can use superior fixed length formats. Heck, even with humans, twelve minute long units means that it is easy to track whether you're getting a pause or a dot (a zero or a one).
Even worse, if you are transmitting Morse over bits. Because (extended) ASCII's eight bits is more efficient unless the message is composed entirely of 'eitans'.
Bits
Meanwhile, if we transmit ASCII, we could transmit a 0 with eight bits. If we break things into nybbles, we can transmit one nybble with a checksum bit every hour. So two hours to transmit one character with some error detection included. Or ninety-six minutes without the checksums.
If we instead use ten bits (two hours), we can do something like Lempel-Ziv. So the first 256 characters are the extended ASCII set. The remaining 768 symbols actually represent multiple characters. So common sequences (e.g. "the ", "ing", and "tion") would have their own ten-bit representation, e.g. 0100000000. This allows the full flexibility of ASCII while also producing a shorter message on average.
The Lempel-Ziv algorithm builds the dictionary from the message itself. We can do better by agreeing on a dictionary beforehand. You can also use this to integrate the error correction and the dictionary, which improves your effective speed.
Numbers are generally going to be better sent as bits than as characters. I.e. instead of sending ASCII 3840, just send 111100000000. That's only twelve bits, hardly more than a single character.
answered 8 hours ago
BrythanBrythan
21k74286
21k74286
add a comment |
add a comment |
$begingroup$
A receiver will pick up the raido signal plus background noise (most notably cosmic background radiation). Generally the received noise power is greater for greater receiver bandwidth. So to get a good signal to noise ratio one can transmit the radio signal within a very narrow frequecy band and put a very narrow band filter on the front of the receiver.
EXAMPLE: The receiver was picking up 1 micro-watt of radio signal and 1 milli-watt of noise power with a 1MHz bandwidth (so a SNR of 0.001).
Droping the bandwith to 10Hz would result in 1 micro watt of radio signal power and 10 nano-watts of received noise power (so a SNR of 100)
Consider a protocol like PSK31 (or similar) used by HAM radios instead of moorse code.
PSK31 uses pure tones of relatively long duration to send 1s and 0s. The longer those tones are the more narrow the filter at the receiver can be. PSKxxx can be used to send low data rate messages across the plannet using only a few watts of power.
Another alternative (though more complex) is using long strings of physical 1s and 0s to represent a single symbol in the protocol. This method is used by GPS for example. The GPS signal is normally about 30X lower power than the background noise, but by correlating long strings of 1024 bits the receiver is able to on average lock onto the signal.
EXAMPLE: Define two long sequences of physical 1s and 0s for each letter of the alphabet. Each code is very different from the other codes.
Let A be 00101010 10001010 10100101 00101010 ...
Let B be 10100001 10100101 00010101 00010100 ...
Let C be 01001010 01010100 00010100 00110101 ...
The sequences may be thousands of bits long if you want. The patterns are generated by a computer automatically when the user types a letter on the keyboard.
The physical bit sequences are sent at a much higher rate than the actual symbols. For example if you want t send one symbol per second and your sequences are 1000 bits long then you send the physical bits at 1000 bits per second.
When receiving the signal + noise; the noise will cause the receiver to make the wrong decision on the physical 1s and 0s some percentage of the time. The receiver stores the received bit pattern and compares it to one of the codes. The receiver then selects the code which most closely matches the received pattern. Even if most of the received bits are wrong, the received code is likely to match most closely to the code sent by the transmitter rather than one of the other codes. Thus the receiver can determine what the transmitter sent even if the background noise is much higher than the received radio signal.
Some other advantages of using long codes is that the codes inherently correct physical bit errors at the receiver. Also different transmitters can each use different code sets so they can talk at the same time (this approach is how CDMA cell phones work).
$endgroup$
$begingroup$
Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
$endgroup$
– a CVn♦
9 hours ago
add a comment |
$begingroup$
A receiver will pick up the raido signal plus background noise (most notably cosmic background radiation). Generally the received noise power is greater for greater receiver bandwidth. So to get a good signal to noise ratio one can transmit the radio signal within a very narrow frequecy band and put a very narrow band filter on the front of the receiver.
EXAMPLE: The receiver was picking up 1 micro-watt of radio signal and 1 milli-watt of noise power with a 1MHz bandwidth (so a SNR of 0.001).
Droping the bandwith to 10Hz would result in 1 micro watt of radio signal power and 10 nano-watts of received noise power (so a SNR of 100)
Consider a protocol like PSK31 (or similar) used by HAM radios instead of moorse code.
PSK31 uses pure tones of relatively long duration to send 1s and 0s. The longer those tones are the more narrow the filter at the receiver can be. PSKxxx can be used to send low data rate messages across the plannet using only a few watts of power.
Another alternative (though more complex) is using long strings of physical 1s and 0s to represent a single symbol in the protocol. This method is used by GPS for example. The GPS signal is normally about 30X lower power than the background noise, but by correlating long strings of 1024 bits the receiver is able to on average lock onto the signal.
EXAMPLE: Define two long sequences of physical 1s and 0s for each letter of the alphabet. Each code is very different from the other codes.
Let A be 00101010 10001010 10100101 00101010 ...
Let B be 10100001 10100101 00010101 00010100 ...
Let C be 01001010 01010100 00010100 00110101 ...
The sequences may be thousands of bits long if you want. The patterns are generated by a computer automatically when the user types a letter on the keyboard.
The physical bit sequences are sent at a much higher rate than the actual symbols. For example if you want t send one symbol per second and your sequences are 1000 bits long then you send the physical bits at 1000 bits per second.
When receiving the signal + noise; the noise will cause the receiver to make the wrong decision on the physical 1s and 0s some percentage of the time. The receiver stores the received bit pattern and compares it to one of the codes. The receiver then selects the code which most closely matches the received pattern. Even if most of the received bits are wrong, the received code is likely to match most closely to the code sent by the transmitter rather than one of the other codes. Thus the receiver can determine what the transmitter sent even if the background noise is much higher than the received radio signal.
Some other advantages of using long codes is that the codes inherently correct physical bit errors at the receiver. Also different transmitters can each use different code sets so they can talk at the same time (this approach is how CDMA cell phones work).
$endgroup$
$begingroup$
Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
$endgroup$
– a CVn♦
9 hours ago
add a comment |
$begingroup$
A receiver will pick up the raido signal plus background noise (most notably cosmic background radiation). Generally the received noise power is greater for greater receiver bandwidth. So to get a good signal to noise ratio one can transmit the radio signal within a very narrow frequecy band and put a very narrow band filter on the front of the receiver.
EXAMPLE: The receiver was picking up 1 micro-watt of radio signal and 1 milli-watt of noise power with a 1MHz bandwidth (so a SNR of 0.001).
Droping the bandwith to 10Hz would result in 1 micro watt of radio signal power and 10 nano-watts of received noise power (so a SNR of 100)
Consider a protocol like PSK31 (or similar) used by HAM radios instead of moorse code.
PSK31 uses pure tones of relatively long duration to send 1s and 0s. The longer those tones are the more narrow the filter at the receiver can be. PSKxxx can be used to send low data rate messages across the plannet using only a few watts of power.
Another alternative (though more complex) is using long strings of physical 1s and 0s to represent a single symbol in the protocol. This method is used by GPS for example. The GPS signal is normally about 30X lower power than the background noise, but by correlating long strings of 1024 bits the receiver is able to on average lock onto the signal.
EXAMPLE: Define two long sequences of physical 1s and 0s for each letter of the alphabet. Each code is very different from the other codes.
Let A be 00101010 10001010 10100101 00101010 ...
Let B be 10100001 10100101 00010101 00010100 ...
Let C be 01001010 01010100 00010100 00110101 ...
The sequences may be thousands of bits long if you want. The patterns are generated by a computer automatically when the user types a letter on the keyboard.
The physical bit sequences are sent at a much higher rate than the actual symbols. For example if you want t send one symbol per second and your sequences are 1000 bits long then you send the physical bits at 1000 bits per second.
When receiving the signal + noise; the noise will cause the receiver to make the wrong decision on the physical 1s and 0s some percentage of the time. The receiver stores the received bit pattern and compares it to one of the codes. The receiver then selects the code which most closely matches the received pattern. Even if most of the received bits are wrong, the received code is likely to match most closely to the code sent by the transmitter rather than one of the other codes. Thus the receiver can determine what the transmitter sent even if the background noise is much higher than the received radio signal.
Some other advantages of using long codes is that the codes inherently correct physical bit errors at the receiver. Also different transmitters can each use different code sets so they can talk at the same time (this approach is how CDMA cell phones work).
$endgroup$
A receiver will pick up the raido signal plus background noise (most notably cosmic background radiation). Generally the received noise power is greater for greater receiver bandwidth. So to get a good signal to noise ratio one can transmit the radio signal within a very narrow frequecy band and put a very narrow band filter on the front of the receiver.
EXAMPLE: The receiver was picking up 1 micro-watt of radio signal and 1 milli-watt of noise power with a 1MHz bandwidth (so a SNR of 0.001).
Droping the bandwith to 10Hz would result in 1 micro watt of radio signal power and 10 nano-watts of received noise power (so a SNR of 100)
Consider a protocol like PSK31 (or similar) used by HAM radios instead of moorse code.
PSK31 uses pure tones of relatively long duration to send 1s and 0s. The longer those tones are the more narrow the filter at the receiver can be. PSKxxx can be used to send low data rate messages across the plannet using only a few watts of power.
Another alternative (though more complex) is using long strings of physical 1s and 0s to represent a single symbol in the protocol. This method is used by GPS for example. The GPS signal is normally about 30X lower power than the background noise, but by correlating long strings of 1024 bits the receiver is able to on average lock onto the signal.
EXAMPLE: Define two long sequences of physical 1s and 0s for each letter of the alphabet. Each code is very different from the other codes.
Let A be 00101010 10001010 10100101 00101010 ...
Let B be 10100001 10100101 00010101 00010100 ...
Let C be 01001010 01010100 00010100 00110101 ...
The sequences may be thousands of bits long if you want. The patterns are generated by a computer automatically when the user types a letter on the keyboard.
The physical bit sequences are sent at a much higher rate than the actual symbols. For example if you want t send one symbol per second and your sequences are 1000 bits long then you send the physical bits at 1000 bits per second.
When receiving the signal + noise; the noise will cause the receiver to make the wrong decision on the physical 1s and 0s some percentage of the time. The receiver stores the received bit pattern and compares it to one of the codes. The receiver then selects the code which most closely matches the received pattern. Even if most of the received bits are wrong, the received code is likely to match most closely to the code sent by the transmitter rather than one of the other codes. Thus the receiver can determine what the transmitter sent even if the background noise is much higher than the received radio signal.
Some other advantages of using long codes is that the codes inherently correct physical bit errors at the receiver. Also different transmitters can each use different code sets so they can talk at the same time (this approach is how CDMA cell phones work).
edited 9 hours ago
answered 10 hours ago
user4574user4574
63636
63636
$begingroup$
Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
$endgroup$
– a CVn♦
9 hours ago
add a comment |
$begingroup$
Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
$endgroup$
– a CVn♦
9 hours ago
$begingroup$
Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
$endgroup$
– a CVn♦
9 hours ago
$begingroup$
Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
$endgroup$
– a CVn♦
9 hours ago
add a comment |
$begingroup$
- Encode whole words instead of single letters.
- Use Huffmann encoding based on word frequency in the specific context of space travel. So that frequent words ('the', 'yes', 'shields') have less bits than less frequent words.
- Use markov chains to take the context of the sentence into account as well.
$endgroup$
add a comment |
$begingroup$
- Encode whole words instead of single letters.
- Use Huffmann encoding based on word frequency in the specific context of space travel. So that frequent words ('the', 'yes', 'shields') have less bits than less frequent words.
- Use markov chains to take the context of the sentence into account as well.
$endgroup$
add a comment |
$begingroup$
- Encode whole words instead of single letters.
- Use Huffmann encoding based on word frequency in the specific context of space travel. So that frequent words ('the', 'yes', 'shields') have less bits than less frequent words.
- Use markov chains to take the context of the sentence into account as well.
$endgroup$
- Encode whole words instead of single letters.
- Use Huffmann encoding based on word frequency in the specific context of space travel. So that frequent words ('the', 'yes', 'shields') have less bits than less frequent words.
- Use markov chains to take the context of the sentence into account as well.
answered 7 hours ago
HelenaHelena
1512
1512
add a comment |
add a comment |
$begingroup$
Oops! When I was writing this, I forgot you said "5 bits per hour" and was thinking "5 bits per day"... read all instances of "day" below a "hour". I'm leaving this message temporarily in case I missed any instances.
Here is the literal answer to your question:
Use base64 character encoding. This allows you to represent the English characters, including numbers, which is precisely what you said you wanted, using 6 bits per character which is just 1 bit short of fitting into 1 hour's worth of transmission in your circumstance.
And here are some enhancements to that by adding special "modes"...
This includes both upper and lower case letters. If you are fine with restricting yourself to one case, which would still fulfill your requirements, then you would have room left to include more punctuation or other enhancements (up to 26 other enhanced transmission modes). I would recommend using some of this extra space to represent some extremely common words or short phrases that you would use very, very often. Then use a few of the character slots for other special meanings, such as "the next few bytes represent status codes" or "the following data is compressed".
Mode 1: Table of most common words or phrases
For the examples below, I'll assume that 2 of the characters represent different word/phrase lookup tables using 9 bits each since this allows the lookup to take exactly 3 hours to send, including the initial 6 bits (6 + 9 = 15 bits = 3 hours). This allows for 512 bits worth of lookup power times 2; that is, 1024 different shortened words or phrases.
Using this format...
"Hey Bob" as plain text requires 6*7=42 bits = 8 hours + 2 bits
"communication array damaged by [reason]", assuming "communication array" and "damaged" are both in the lookup table, would take 9 hour + 3 bits + [however long it takes to send the reason]. "communication array damaged by Klingon torp" would take 24 hours - less if either "Klingon" or "torp" were send as lookup words instead of as plain text.
Mode 2: Look-back
This is a "repeat recent word" mode. In computer science, it has been shown that recently used data is among the most likely data to be used next, and that is what a PC memory cache is for. We can do something similar by making 1 of the character slots represent "The next 4 bits refer to a previous word; count back that many words in the most recent transmitted data."
With this, "Klingon fleet approaches from 294 and Klingon admiral on comms saying Klingon destroyers equipped with new black hole tech" allows you to shorten 2 instances of "Klingon" to exactly 2 hours worth of data each; the first one providing "0110" (6) as the 4-bit-lookback value and the second instance being "0101" (5). In some communications this could save a lot of time if words are repeated often. Note that pronouns like "their" could have been used in some places, but that would take 7 plaintext characters (including surrounding spaces), which in this case would have taken 6 hours + 2 bits longer to send.
Mode 3: Copy/paste, possibly with separate paste-buffers
This would allow customized shortcuts that were not thought of before launch, a sort of copy/paste. "start copying here", then continue the message, then a "stop copying here"... then later you can send a "paste" character to repeat a long message. "comms good, thrusters good, life support good, magnetic-artificial-gravity [copy]working intermittently due to a swarm of flies in the grav-capacitor[end copy] from a meal someone left out for days", then you only have to send that long text 1 time, and each time after that you send "comms good, thrusters good, life support good, magnetic-artificial-gravity [paste]", and you do that until it changes back to "good". Also, "comms", "thrusters", "life support", "magnetic-artificial-gravity", and "good" might all be in the lookup table, meaning this entire message takes 22 hours + 1 bit to send after the first time you send it. Even better is if you make this "paste mode" be followed by a few bits for a "paste buffer number". Then you could "[copy1]comms good, [etc.], [copy2]working intermittently due to...[end-copy-2][end-copy-1] from a meal that..." Then every time you want to send an updated status, if it's the same as the one before it takes 1 hour plus a few bits.
You can tweak the exact representation (different send modes, number of bits for each, etc.) to improve performance based on your expected communications to improve performance further.
Mode 4: Compression
Just what it sounds like. The following data uses a given compression algorithm.
Multiple versions of each mode
If you cut out lower case and still have more character slots left over after implementing all the punctuation and mode's you want, you can use left over character slots to expand modes that you already have. This is similar to what I did above with the lookup tables, I suggested using 2 of the base64 character slots that were freed up from tossing lower-case to give us 2 separate lookup tables. You could also do similar to double your look-back reach or to double your number of copy/paste buffers. You can also increase or decrease the number of bits following a mode-byte, such as having 4 bits after copy/paste to have 16 paste buffers, or only 2 extra bits to save on transmission time but allow only 4 paste buffers.
So how efficient is this?
Worst case scenario this requires 6 bits per character. Average case scenario you will use a few lookups, look-backs, or some compression to beat the worst case, so you require 3-5 bits per character. Best case is messages that can be relayed entirely, or nearly entirely, by lookups and look-backs, which will be often for normal day to day activities that go as expected - for such common communication, if you have a well tuned number of bits for each special mode, you should achieve better than 1 bit per character. Many times much, much better than 1 bit per character, such as with the status report example a couple paragraphs up.
$endgroup$
$begingroup$
Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed byzlib
by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
$endgroup$
– Henning Makholm
5 hours ago
add a comment |
$begingroup$
Oops! When I was writing this, I forgot you said "5 bits per hour" and was thinking "5 bits per day"... read all instances of "day" below a "hour". I'm leaving this message temporarily in case I missed any instances.
Here is the literal answer to your question:
Use base64 character encoding. This allows you to represent the English characters, including numbers, which is precisely what you said you wanted, using 6 bits per character which is just 1 bit short of fitting into 1 hour's worth of transmission in your circumstance.
And here are some enhancements to that by adding special "modes"...
This includes both upper and lower case letters. If you are fine with restricting yourself to one case, which would still fulfill your requirements, then you would have room left to include more punctuation or other enhancements (up to 26 other enhanced transmission modes). I would recommend using some of this extra space to represent some extremely common words or short phrases that you would use very, very often. Then use a few of the character slots for other special meanings, such as "the next few bytes represent status codes" or "the following data is compressed".
Mode 1: Table of most common words or phrases
For the examples below, I'll assume that 2 of the characters represent different word/phrase lookup tables using 9 bits each since this allows the lookup to take exactly 3 hours to send, including the initial 6 bits (6 + 9 = 15 bits = 3 hours). This allows for 512 bits worth of lookup power times 2; that is, 1024 different shortened words or phrases.
Using this format...
"Hey Bob" as plain text requires 6*7=42 bits = 8 hours + 2 bits
"communication array damaged by [reason]", assuming "communication array" and "damaged" are both in the lookup table, would take 9 hour + 3 bits + [however long it takes to send the reason]. "communication array damaged by Klingon torp" would take 24 hours - less if either "Klingon" or "torp" were send as lookup words instead of as plain text.
Mode 2: Look-back
This is a "repeat recent word" mode. In computer science, it has been shown that recently used data is among the most likely data to be used next, and that is what a PC memory cache is for. We can do something similar by making 1 of the character slots represent "The next 4 bits refer to a previous word; count back that many words in the most recent transmitted data."
With this, "Klingon fleet approaches from 294 and Klingon admiral on comms saying Klingon destroyers equipped with new black hole tech" allows you to shorten 2 instances of "Klingon" to exactly 2 hours worth of data each; the first one providing "0110" (6) as the 4-bit-lookback value and the second instance being "0101" (5). In some communications this could save a lot of time if words are repeated often. Note that pronouns like "their" could have been used in some places, but that would take 7 plaintext characters (including surrounding spaces), which in this case would have taken 6 hours + 2 bits longer to send.
Mode 3: Copy/paste, possibly with separate paste-buffers
This would allow customized shortcuts that were not thought of before launch, a sort of copy/paste. "start copying here", then continue the message, then a "stop copying here"... then later you can send a "paste" character to repeat a long message. "comms good, thrusters good, life support good, magnetic-artificial-gravity [copy]working intermittently due to a swarm of flies in the grav-capacitor[end copy] from a meal someone left out for days", then you only have to send that long text 1 time, and each time after that you send "comms good, thrusters good, life support good, magnetic-artificial-gravity [paste]", and you do that until it changes back to "good". Also, "comms", "thrusters", "life support", "magnetic-artificial-gravity", and "good" might all be in the lookup table, meaning this entire message takes 22 hours + 1 bit to send after the first time you send it. Even better is if you make this "paste mode" be followed by a few bits for a "paste buffer number". Then you could "[copy1]comms good, [etc.], [copy2]working intermittently due to...[end-copy-2][end-copy-1] from a meal that..." Then every time you want to send an updated status, if it's the same as the one before it takes 1 hour plus a few bits.
You can tweak the exact representation (different send modes, number of bits for each, etc.) to improve performance based on your expected communications to improve performance further.
Mode 4: Compression
Just what it sounds like. The following data uses a given compression algorithm.
Multiple versions of each mode
If you cut out lower case and still have more character slots left over after implementing all the punctuation and mode's you want, you can use left over character slots to expand modes that you already have. This is similar to what I did above with the lookup tables, I suggested using 2 of the base64 character slots that were freed up from tossing lower-case to give us 2 separate lookup tables. You could also do similar to double your look-back reach or to double your number of copy/paste buffers. You can also increase or decrease the number of bits following a mode-byte, such as having 4 bits after copy/paste to have 16 paste buffers, or only 2 extra bits to save on transmission time but allow only 4 paste buffers.
So how efficient is this?
Worst case scenario this requires 6 bits per character. Average case scenario you will use a few lookups, look-backs, or some compression to beat the worst case, so you require 3-5 bits per character. Best case is messages that can be relayed entirely, or nearly entirely, by lookups and look-backs, which will be often for normal day to day activities that go as expected - for such common communication, if you have a well tuned number of bits for each special mode, you should achieve better than 1 bit per character. Many times much, much better than 1 bit per character, such as with the status report example a couple paragraphs up.
$endgroup$
$begingroup$
Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed byzlib
by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
$endgroup$
– Henning Makholm
5 hours ago
add a comment |
$begingroup$
Oops! When I was writing this, I forgot you said "5 bits per hour" and was thinking "5 bits per day"... read all instances of "day" below a "hour". I'm leaving this message temporarily in case I missed any instances.
Here is the literal answer to your question:
Use base64 character encoding. This allows you to represent the English characters, including numbers, which is precisely what you said you wanted, using 6 bits per character which is just 1 bit short of fitting into 1 hour's worth of transmission in your circumstance.
And here are some enhancements to that by adding special "modes"...
This includes both upper and lower case letters. If you are fine with restricting yourself to one case, which would still fulfill your requirements, then you would have room left to include more punctuation or other enhancements (up to 26 other enhanced transmission modes). I would recommend using some of this extra space to represent some extremely common words or short phrases that you would use very, very often. Then use a few of the character slots for other special meanings, such as "the next few bytes represent status codes" or "the following data is compressed".
Mode 1: Table of most common words or phrases
For the examples below, I'll assume that 2 of the characters represent different word/phrase lookup tables using 9 bits each since this allows the lookup to take exactly 3 hours to send, including the initial 6 bits (6 + 9 = 15 bits = 3 hours). This allows for 512 bits worth of lookup power times 2; that is, 1024 different shortened words or phrases.
Using this format...
"Hey Bob" as plain text requires 6*7=42 bits = 8 hours + 2 bits
"communication array damaged by [reason]", assuming "communication array" and "damaged" are both in the lookup table, would take 9 hour + 3 bits + [however long it takes to send the reason]. "communication array damaged by Klingon torp" would take 24 hours - less if either "Klingon" or "torp" were send as lookup words instead of as plain text.
Mode 2: Look-back
This is a "repeat recent word" mode. In computer science, it has been shown that recently used data is among the most likely data to be used next, and that is what a PC memory cache is for. We can do something similar by making 1 of the character slots represent "The next 4 bits refer to a previous word; count back that many words in the most recent transmitted data."
With this, "Klingon fleet approaches from 294 and Klingon admiral on comms saying Klingon destroyers equipped with new black hole tech" allows you to shorten 2 instances of "Klingon" to exactly 2 hours worth of data each; the first one providing "0110" (6) as the 4-bit-lookback value and the second instance being "0101" (5). In some communications this could save a lot of time if words are repeated often. Note that pronouns like "their" could have been used in some places, but that would take 7 plaintext characters (including surrounding spaces), which in this case would have taken 6 hours + 2 bits longer to send.
Mode 3: Copy/paste, possibly with separate paste-buffers
This would allow customized shortcuts that were not thought of before launch, a sort of copy/paste. "start copying here", then continue the message, then a "stop copying here"... then later you can send a "paste" character to repeat a long message. "comms good, thrusters good, life support good, magnetic-artificial-gravity [copy]working intermittently due to a swarm of flies in the grav-capacitor[end copy] from a meal someone left out for days", then you only have to send that long text 1 time, and each time after that you send "comms good, thrusters good, life support good, magnetic-artificial-gravity [paste]", and you do that until it changes back to "good". Also, "comms", "thrusters", "life support", "magnetic-artificial-gravity", and "good" might all be in the lookup table, meaning this entire message takes 22 hours + 1 bit to send after the first time you send it. Even better is if you make this "paste mode" be followed by a few bits for a "paste buffer number". Then you could "[copy1]comms good, [etc.], [copy2]working intermittently due to...[end-copy-2][end-copy-1] from a meal that..." Then every time you want to send an updated status, if it's the same as the one before it takes 1 hour plus a few bits.
You can tweak the exact representation (different send modes, number of bits for each, etc.) to improve performance based on your expected communications to improve performance further.
Mode 4: Compression
Just what it sounds like. The following data uses a given compression algorithm.
Multiple versions of each mode
If you cut out lower case and still have more character slots left over after implementing all the punctuation and mode's you want, you can use left over character slots to expand modes that you already have. This is similar to what I did above with the lookup tables, I suggested using 2 of the base64 character slots that were freed up from tossing lower-case to give us 2 separate lookup tables. You could also do similar to double your look-back reach or to double your number of copy/paste buffers. You can also increase or decrease the number of bits following a mode-byte, such as having 4 bits after copy/paste to have 16 paste buffers, or only 2 extra bits to save on transmission time but allow only 4 paste buffers.
So how efficient is this?
Worst case scenario this requires 6 bits per character. Average case scenario you will use a few lookups, look-backs, or some compression to beat the worst case, so you require 3-5 bits per character. Best case is messages that can be relayed entirely, or nearly entirely, by lookups and look-backs, which will be often for normal day to day activities that go as expected - for such common communication, if you have a well tuned number of bits for each special mode, you should achieve better than 1 bit per character. Many times much, much better than 1 bit per character, such as with the status report example a couple paragraphs up.
$endgroup$
Oops! When I was writing this, I forgot you said "5 bits per hour" and was thinking "5 bits per day"... read all instances of "day" below a "hour". I'm leaving this message temporarily in case I missed any instances.
Here is the literal answer to your question:
Use base64 character encoding. This allows you to represent the English characters, including numbers, which is precisely what you said you wanted, using 6 bits per character which is just 1 bit short of fitting into 1 hour's worth of transmission in your circumstance.
And here are some enhancements to that by adding special "modes"...
This includes both upper and lower case letters. If you are fine with restricting yourself to one case, which would still fulfill your requirements, then you would have room left to include more punctuation or other enhancements (up to 26 other enhanced transmission modes). I would recommend using some of this extra space to represent some extremely common words or short phrases that you would use very, very often. Then use a few of the character slots for other special meanings, such as "the next few bytes represent status codes" or "the following data is compressed".
Mode 1: Table of most common words or phrases
For the examples below, I'll assume that 2 of the characters represent different word/phrase lookup tables using 9 bits each since this allows the lookup to take exactly 3 hours to send, including the initial 6 bits (6 + 9 = 15 bits = 3 hours). This allows for 512 bits worth of lookup power times 2; that is, 1024 different shortened words or phrases.
Using this format...
"Hey Bob" as plain text requires 6*7=42 bits = 8 hours + 2 bits
"communication array damaged by [reason]", assuming "communication array" and "damaged" are both in the lookup table, would take 9 hour + 3 bits + [however long it takes to send the reason]. "communication array damaged by Klingon torp" would take 24 hours - less if either "Klingon" or "torp" were send as lookup words instead of as plain text.
Mode 2: Look-back
This is a "repeat recent word" mode. In computer science, it has been shown that recently used data is among the most likely data to be used next, and that is what a PC memory cache is for. We can do something similar by making 1 of the character slots represent "The next 4 bits refer to a previous word; count back that many words in the most recent transmitted data."
With this, "Klingon fleet approaches from 294 and Klingon admiral on comms saying Klingon destroyers equipped with new black hole tech" allows you to shorten 2 instances of "Klingon" to exactly 2 hours worth of data each; the first one providing "0110" (6) as the 4-bit-lookback value and the second instance being "0101" (5). In some communications this could save a lot of time if words are repeated often. Note that pronouns like "their" could have been used in some places, but that would take 7 plaintext characters (including surrounding spaces), which in this case would have taken 6 hours + 2 bits longer to send.
Mode 3: Copy/paste, possibly with separate paste-buffers
This would allow customized shortcuts that were not thought of before launch, a sort of copy/paste. "start copying here", then continue the message, then a "stop copying here"... then later you can send a "paste" character to repeat a long message. "comms good, thrusters good, life support good, magnetic-artificial-gravity [copy]working intermittently due to a swarm of flies in the grav-capacitor[end copy] from a meal someone left out for days", then you only have to send that long text 1 time, and each time after that you send "comms good, thrusters good, life support good, magnetic-artificial-gravity [paste]", and you do that until it changes back to "good". Also, "comms", "thrusters", "life support", "magnetic-artificial-gravity", and "good" might all be in the lookup table, meaning this entire message takes 22 hours + 1 bit to send after the first time you send it. Even better is if you make this "paste mode" be followed by a few bits for a "paste buffer number". Then you could "[copy1]comms good, [etc.], [copy2]working intermittently due to...[end-copy-2][end-copy-1] from a meal that..." Then every time you want to send an updated status, if it's the same as the one before it takes 1 hour plus a few bits.
You can tweak the exact representation (different send modes, number of bits for each, etc.) to improve performance based on your expected communications to improve performance further.
Mode 4: Compression
Just what it sounds like. The following data uses a given compression algorithm.
Multiple versions of each mode
If you cut out lower case and still have more character slots left over after implementing all the punctuation and mode's you want, you can use left over character slots to expand modes that you already have. This is similar to what I did above with the lookup tables, I suggested using 2 of the base64 character slots that were freed up from tossing lower-case to give us 2 separate lookup tables. You could also do similar to double your look-back reach or to double your number of copy/paste buffers. You can also increase or decrease the number of bits following a mode-byte, such as having 4 bits after copy/paste to have 16 paste buffers, or only 2 extra bits to save on transmission time but allow only 4 paste buffers.
So how efficient is this?
Worst case scenario this requires 6 bits per character. Average case scenario you will use a few lookups, look-backs, or some compression to beat the worst case, so you require 3-5 bits per character. Best case is messages that can be relayed entirely, or nearly entirely, by lookups and look-backs, which will be often for normal day to day activities that go as expected - for such common communication, if you have a well tuned number of bits for each special mode, you should achieve better than 1 bit per character. Many times much, much better than 1 bit per character, such as with the status report example a couple paragraphs up.
edited 6 hours ago
answered 7 hours ago
AaronAaron
2,554620
2,554620
$begingroup$
Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed byzlib
by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
$endgroup$
– Henning Makholm
5 hours ago
add a comment |
$begingroup$
Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed byzlib
by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
$endgroup$
– Henning Makholm
5 hours ago
$begingroup$
Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed by
zlib
by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).$endgroup$
– Henning Makholm
5 hours ago
$begingroup$
Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed by
zlib
by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).$endgroup$
– Henning Makholm
5 hours ago
add a comment |
$begingroup$
Huffman Encoding
Basically, you want the same methodology we use today for writing to a .zip file. Basically what happens is we take the most common character in the file (probably 'e'), and say that it simply corresponds to the bit '1'. Then the next most common one ('a' maybe?) will be '01', and the next most common (let's say 't') will be '001'.
So, given this system, "eat" = "101001", while "tea" = "001101".
This is the most efficient form of encoding there is, as it gives you access to any number of characters while still using very few bits for the vast majority of the ones you're using.
Note though: this is most effective when some letters/characters are used far more than other ones (as it is in modern English).
Also, most .zip files will send along a "dictionary" of bit combinations and characters, so the other person can translate out of it. This can be wasteful to send every time, especially for short messages. However, if every user has a well-known dictionary that is encoded to best represent common English usages it can work.
$endgroup$
add a comment |
$begingroup$
Huffman Encoding
Basically, you want the same methodology we use today for writing to a .zip file. Basically what happens is we take the most common character in the file (probably 'e'), and say that it simply corresponds to the bit '1'. Then the next most common one ('a' maybe?) will be '01', and the next most common (let's say 't') will be '001'.
So, given this system, "eat" = "101001", while "tea" = "001101".
This is the most efficient form of encoding there is, as it gives you access to any number of characters while still using very few bits for the vast majority of the ones you're using.
Note though: this is most effective when some letters/characters are used far more than other ones (as it is in modern English).
Also, most .zip files will send along a "dictionary" of bit combinations and characters, so the other person can translate out of it. This can be wasteful to send every time, especially for short messages. However, if every user has a well-known dictionary that is encoded to best represent common English usages it can work.
$endgroup$
add a comment |
$begingroup$
Huffman Encoding
Basically, you want the same methodology we use today for writing to a .zip file. Basically what happens is we take the most common character in the file (probably 'e'), and say that it simply corresponds to the bit '1'. Then the next most common one ('a' maybe?) will be '01', and the next most common (let's say 't') will be '001'.
So, given this system, "eat" = "101001", while "tea" = "001101".
This is the most efficient form of encoding there is, as it gives you access to any number of characters while still using very few bits for the vast majority of the ones you're using.
Note though: this is most effective when some letters/characters are used far more than other ones (as it is in modern English).
Also, most .zip files will send along a "dictionary" of bit combinations and characters, so the other person can translate out of it. This can be wasteful to send every time, especially for short messages. However, if every user has a well-known dictionary that is encoded to best represent common English usages it can work.
$endgroup$
Huffman Encoding
Basically, you want the same methodology we use today for writing to a .zip file. Basically what happens is we take the most common character in the file (probably 'e'), and say that it simply corresponds to the bit '1'. Then the next most common one ('a' maybe?) will be '01', and the next most common (let's say 't') will be '001'.
So, given this system, "eat" = "101001", while "tea" = "001101".
This is the most efficient form of encoding there is, as it gives you access to any number of characters while still using very few bits for the vast majority of the ones you're using.
Note though: this is most effective when some letters/characters are used far more than other ones (as it is in modern English).
Also, most .zip files will send along a "dictionary" of bit combinations and characters, so the other person can translate out of it. This can be wasteful to send every time, especially for short messages. However, if every user has a well-known dictionary that is encoded to best represent common English usages it can work.
answered 1 hour ago
Bert HaddadBert Haddad
3,050817
3,050817
add a comment |
add a comment |
Thanks for contributing an answer to Worldbuilding Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworldbuilding.stackexchange.com%2fquestions%2f144224%2fmost-bit-efficient-text-communication-method%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
What type of spacecraft are we looking at here, and what data does it need to communicate?
$endgroup$
– Cadence
16 hours ago
15
$begingroup$
What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
$endgroup$
– AlexP
16 hours ago
2
$begingroup$
If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
$endgroup$
– Jack Aidley
13 hours ago
2
$begingroup$
why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
$endgroup$
– opa
10 hours ago
1
$begingroup$
@opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
$endgroup$
– Andrey
6 hours ago