Most bit efficient text communication method? Announcing the arrival of Valued Associate #679:...

Compare a given version number in the form major.minor.build.patch and see if one is less than the other

When the Haste spell ends on a creature, do attackers have advantage against that creature?

Is there such thing as an Availability Group failover trigger?

Do square wave exist?

Is this homebrew Lady of Pain warlock patron balanced?

Should I use a zero-interest credit card for a large one-time purchase?

Why wasn't DOSKEY integrated with COMMAND.COM?

How would a mousetrap for use in space work?

Is it a good idea to use CNN to classify 1D signal?

Most bit efficient text communication method?

8 Prisoners wearing hats

First console to have temporary backward compatibility

How do pianists reach extremely loud dynamics?

Significance of Cersei's obsession with elephants?

How to tell that you are a giant?

What would be the ideal power source for a cybernetic eye?

Using audio cues to encourage good posture

If a VARCHAR(MAX) column is included in an index, is the entire value always stored in the index page(s)?

Dating a Former Employee

Can anything be seen from the center of the Boötes void? How dark would it be?

Maximum summed powersets with non-adjacent items

また usage in a dictionary

Amount of permutations on an NxNxN Rubik's Cube

Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?



Most bit efficient text communication method?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The network's official Twitter account is up and running again. What content…How would aliens recognize communication coming from Earth as communication?Interstellar communicationFar Future: Most plausible everyday communication (device)?Hive mind withdrawal and communication skillsHow to add tactics and maneuvering into space warfareWhat is the best way to classify Intelligence in a world full of various Intellects?How fast can a species communicate using only tapping?Would aliens evolve a method of communication other than speaking?What's the most efficient way to keep everyone informed intergalactically?Natural gas based communication












16












$begingroup$


What would be the most efficient text communication method for a spacecraft operating on a super low bit rate (I'm talking something like 5 bits an hour, excluding error handling)?



As you want both complexity (full English language and numbers) and speed (letters per day) resorting to something like Morse code seems the most obvious solution but is there any other options out there?










share|improve this question









$endgroup$








  • 2




    $begingroup$
    What type of spacecraft are we looking at here, and what data does it need to communicate?
    $endgroup$
    – Cadence
    16 hours ago






  • 15




    $begingroup$
    What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
    $endgroup$
    – AlexP
    16 hours ago








  • 2




    $begingroup$
    If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
    $endgroup$
    – Jack Aidley
    13 hours ago






  • 2




    $begingroup$
    why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
    $endgroup$
    – opa
    10 hours ago






  • 1




    $begingroup$
    @opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
    $endgroup$
    – Andrey
    6 hours ago
















16












$begingroup$


What would be the most efficient text communication method for a spacecraft operating on a super low bit rate (I'm talking something like 5 bits an hour, excluding error handling)?



As you want both complexity (full English language and numbers) and speed (letters per day) resorting to something like Morse code seems the most obvious solution but is there any other options out there?










share|improve this question









$endgroup$








  • 2




    $begingroup$
    What type of spacecraft are we looking at here, and what data does it need to communicate?
    $endgroup$
    – Cadence
    16 hours ago






  • 15




    $begingroup$
    What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
    $endgroup$
    – AlexP
    16 hours ago








  • 2




    $begingroup$
    If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
    $endgroup$
    – Jack Aidley
    13 hours ago






  • 2




    $begingroup$
    why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
    $endgroup$
    – opa
    10 hours ago






  • 1




    $begingroup$
    @opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
    $endgroup$
    – Andrey
    6 hours ago














16












16








16


3



$begingroup$


What would be the most efficient text communication method for a spacecraft operating on a super low bit rate (I'm talking something like 5 bits an hour, excluding error handling)?



As you want both complexity (full English language and numbers) and speed (letters per day) resorting to something like Morse code seems the most obvious solution but is there any other options out there?










share|improve this question









$endgroup$




What would be the most efficient text communication method for a spacecraft operating on a super low bit rate (I'm talking something like 5 bits an hour, excluding error handling)?



As you want both complexity (full English language and numbers) and speed (letters per day) resorting to something like Morse code seems the most obvious solution but is there any other options out there?







space-travel communication interstellar-travel






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 16 hours ago









ExostrikeExostrike

36528




36528








  • 2




    $begingroup$
    What type of spacecraft are we looking at here, and what data does it need to communicate?
    $endgroup$
    – Cadence
    16 hours ago






  • 15




    $begingroup$
    What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
    $endgroup$
    – AlexP
    16 hours ago








  • 2




    $begingroup$
    If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
    $endgroup$
    – Jack Aidley
    13 hours ago






  • 2




    $begingroup$
    why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
    $endgroup$
    – opa
    10 hours ago






  • 1




    $begingroup$
    @opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
    $endgroup$
    – Andrey
    6 hours ago














  • 2




    $begingroup$
    What type of spacecraft are we looking at here, and what data does it need to communicate?
    $endgroup$
    – Cadence
    16 hours ago






  • 15




    $begingroup$
    What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
    $endgroup$
    – AlexP
    16 hours ago








  • 2




    $begingroup$
    If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
    $endgroup$
    – Jack Aidley
    13 hours ago






  • 2




    $begingroup$
    why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
    $endgroup$
    – opa
    10 hours ago






  • 1




    $begingroup$
    @opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
    $endgroup$
    – Andrey
    6 hours ago








2




2




$begingroup$
What type of spacecraft are we looking at here, and what data does it need to communicate?
$endgroup$
– Cadence
16 hours ago




$begingroup$
What type of spacecraft are we looking at here, and what data does it need to communicate?
$endgroup$
– Cadence
16 hours ago




15




15




$begingroup$
What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
$endgroup$
– AlexP
16 hours ago






$begingroup$
What's the relationship between bits and Morse code? At five bits per hour you cannot reasonably communicate anything other than predefined status messages. (English has an entropy of about 1.3 bits per character, and a typical word is 4 or 5 characters. At 5 bits per hour with optimal compression you can send about 20 words per day of unconstrained text. This is too low, so in practice you will want to predefine a number of status messages and send an index into the message table.)
$endgroup$
– AlexP
16 hours ago






2




2




$begingroup$
If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
$endgroup$
– Jack Aidley
13 hours ago




$begingroup$
If it's 5 bits per hour excluding error handling, you're probably better to drop error handling and use those extra bits for the message. A mildly scrambled message is still better than not being able to send the message at all (usually).
$endgroup$
– Jack Aidley
13 hours ago




2




2




$begingroup$
why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
$endgroup$
– opa
10 hours ago




$begingroup$
why 5 bits per hour? You can manage way better with radio today, which is one of the lowest tech options for space communication. Radio is still pretty poor, but it provides way better speeds than "5 bits an hour"
$endgroup$
– opa
10 hours ago




1




1




$begingroup$
@opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
$endgroup$
– Andrey
6 hours ago




$begingroup$
@opa I am assuming OP has FTL communication, it's just slow. Your simple radio will take decades or centuries to send a signal.
$endgroup$
– Andrey
6 hours ago










10 Answers
10






active

oldest

votes


















39












$begingroup$

The most efficient communication is probably a command set. Since you contemplated Morse code, I assume that the communication is done via a fully defined interface - both sender and receiver know what a bit sequence is supposed to mean.



A command set is no more that giving different codes predefined meanings. With one singe bit you can define 2 commands:



| value | meaning   |
| 0 | light off |
| 1 | light on |


With 4 bits you can define 15 different commands, with 1 byte (8 bits) 255 commands, with 2 bytes 65535 commands and so on. If all you really need is to display texts to an astronaut, you have to store a bunch of ready made texts like "Activate X-ray sensors" in a database and send the corresponding message ID from Earth. For more complex messages you can store text templates in a database and then compile a message from several templates.



An early real-world example is the list of Q-Codes, created circa 1909, by the British government as a "list of abbreviations... prepared for the use of British ships and coast stations licensed by the Postmaster General".





If you need to communicate more than simple texts, you would separate a message into a command part and a message part. You could, for example, tell the space ship:




Activate X-ray sensors




By sending a signal of 2 bytes:



| byte | value | meaning            |
| 1 | 01 | activate appliance |
| 2 | 08 | X-ray sensor array |


Communication with an astronaut would be possible with a different command:



| byte | value | meaning           |
| 1 | 04 | write to terminal |
| 2 | 08 | text with ID 8 |


That would result in slightly longer commands, but the possibilities of what you can achieve with a few bytes are multiplied.





If you have a really big database with a whole lot of different texts, it might be more efficient to terminate commands with a defined code. For this approach, the database must be sorted in a way that gives the most frequent commands the lowest ID.



Let's define 0000 as the terminator.




  • For a very common command with the ID 6, you need to send the command's ID followed by the terminator: 0110 0000.

  • A very uncommon command with the ID 26683 would look like this: 0110 1000 0011 1011 0000.


The advantage is that you can have commands of dynamic lengths (instead of sending a whole bunch of useless 0's to fill up the static length of a command).



The disadvantage is that every command is longer than it could ideally be. So this approach only gets worthwhile when you need a great many commands.





After defining your command set, the next step is to make sure that you received the correct message. Losing just a single bit can change a message of "Activate X-ray sensors" into "Destroy X-ray sensors" or similar. This is usually done with a checksum, which requires some more bits to transmit.



Have a look at the difference between two common data transmission protocols for the internet: UDP and TCP. UDP is the most efficient in respect to transfer rate, but TCP trades some efficiency for reliability by including some overhead for error checking.






share|improve this answer











$endgroup$









  • 6




    $begingroup$
    +1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
    $endgroup$
    – Joe Bloggs
    15 hours ago






  • 19




    $begingroup$
    So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
    $endgroup$
    – Chronocidal
    15 hours ago






  • 8




    $begingroup$
    1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
    $endgroup$
    – genesis
    15 hours ago










  • $begingroup$
    You could take AS Interface as a reference. A small scale industrial communication bus system
    $endgroup$
    – Alexander von Wernherr
    14 hours ago






  • 2




    $begingroup$
    If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
    $endgroup$
    – FluxIX
    12 hours ago



















13












$begingroup$

According to Schneier the entropy of English text is below 1.6 bit per letter. Given a difficult constraint such as yours I would expect people to come up with compression algorithms getting close to that.



If you don't need the full power of English you might get much better compression if you can pre-define a small set of words that would be sufficient. Something similar in principle to https://xkcd.com/1133/



I think you need to answer two important questions:




  1. Is the system pre-defined, i.e. can there be word-lists?

  2. Are characters/words sent individually or can you apply compression to a large amount of data and then send it in bulk?


If you want something that is simple, sciency and requires no setup, go with Huffman-coding individual letters based on frequency in English. ;)






share|improve this answer











$endgroup$









  • 1




    $begingroup$
    as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
    $endgroup$
    – Andrey
    6 hours ago










  • $begingroup$
    @Andrey I'm not sure I would call rocket science a "simple idea".
    $endgroup$
    – TheHansinator
    3 hours ago






  • 1




    $begingroup$
    @Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
    $endgroup$
    – TheHansinator
    3 hours ago



















6












$begingroup$

You might look at digital modes for amateur radio here. Some of those modes use what's called "varicode" -- where different characters have different symbol lengths (Morse code is a varicode system -- more commonly used letters are shorter in terms of transmission time). When sending English text, a varicode will minimize the number of bits required for a sufficiently large sample (which reasonably ought to include a large number of messages). If "text speak" is used commonly, it might make sense to design the varicode used around letter frequencies in that particular text format.



If longer messages are common, some form of compression would make sense -- text typically compresses will with common compression algorithms, but the compression headers make this inefficient for very small blocks of data (text or otherwise).






share|improve this answer









$endgroup$





















    4












    $begingroup$

    Textspeak



    SMS messages originally were 160 characters so textspeak evolved to reduce everything down to the most compact form through abbreviations, acronyms and emoticons.



    Sounds like a good reason to send teenagers into space....






    share|improve this answer











    $endgroup$













    • $begingroup$
      I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
      $endgroup$
      – Aaron
      8 hours ago










    • $begingroup$
      Was previously beeper speak: 143 133 43 43 5318008
      $endgroup$
      – RIanGillis
      6 hours ago



















    2












    $begingroup$

    Building on other answers



    In addition to the different encoding and compression methods, one thing to look into is shorthand techniques that allow you to drop letters while still being able to interpret the message. Some examples:




    • it, to -> t

    • is -> s

    • have -> hv

    • cat -> ct

    • are -> r


    Example sentence: hw r u?



    An alternative approach



    Encode your information in time delays



    Presumably there is some reason that you can't speed up the data transmission, but perhaps you can slow it down. At 5 bits per hour, that's 12 minutes between each bit. Instead of sending each bit at regular interval, you can delay transmission of bits and use the delay time as a means of conveying information.



    So let's say you expect a minimum of 12 minutes between each bit, you can encode the data as follows (time is in mm:ss format):




    • 12:00 = 0

    • 12:05 = 1

    • 12:10 = 2

    • 12:15 = 3

    • etc


    The more data you encode, the fewer bits per day you'll be able to transmit, so there will be some optimal balance you'll have to figure out based on the minimum delay interval you consider acceptable. Then you can perhaps use the bits themselves as an error checking mechanism, or to still transmit data.






    share|improve this answer









    $endgroup$





















      2












      $begingroup$

      Not Morse code



      From Wikipedia:




      International Morse code is composed of five elements:[1]




      1. short mark, dot or "dit" (▄▄▄▄): "dot duration" is one time unit long

      2. longer mark, dash or "dah" (▄▄▄▄▄▄): three time units long

      3. inter-element gap between the dots and dashes within a character: one dot duration or one unit long

      4. short gap (between letters): three time units long

      5. medium gap (between words): seven time units long




      If we use one bit to store one unit of information, it takes four bits to transmit even the shortest letter ('e') and its subsequent gap. The next shortest are 'i' and 't' at six bits. Then 'a', 'n', and 's' at eight. The longest character in the Morse alphabet is 0, which requires five dashes or twenty-two units/bits. And that only supports the thirty-six character latin alphanumeric alphabet.



      Morse is designed around humans. Humans do better with indeterminate length than fixed length, as we don't have good timing ability (we can't tell a five unit pause from a four unit pause consistently). But if these messages are being transmitted computer to computer, computers have great ability at timing. We can use superior fixed length formats. Heck, even with humans, twelve minute long units means that it is easy to track whether you're getting a pause or a dot (a zero or a one).



      Even worse, if you are transmitting Morse over bits. Because (extended) ASCII's eight bits is more efficient unless the message is composed entirely of 'eitans'.



      Bits



      Meanwhile, if we transmit ASCII, we could transmit a 0 with eight bits. If we break things into nybbles, we can transmit one nybble with a checksum bit every hour. So two hours to transmit one character with some error detection included. Or ninety-six minutes without the checksums.



      If we instead use ten bits (two hours), we can do something like Lempel-Ziv. So the first 256 characters are the extended ASCII set. The remaining 768 symbols actually represent multiple characters. So common sequences (e.g. "the ", "ing", and "tion") would have their own ten-bit representation, e.g. 0100000000. This allows the full flexibility of ASCII while also producing a shorter message on average.



      The Lempel-Ziv algorithm builds the dictionary from the message itself. We can do better by agreeing on a dictionary beforehand. You can also use this to integrate the error correction and the dictionary, which improves your effective speed.



      Numbers are generally going to be better sent as bits than as characters. I.e. instead of sending ASCII 3840, just send 111100000000. That's only twelve bits, hardly more than a single character.






      share|improve this answer









      $endgroup$





















        1












        $begingroup$

        A receiver will pick up the raido signal plus background noise (most notably cosmic background radiation). Generally the received noise power is greater for greater receiver bandwidth. So to get a good signal to noise ratio one can transmit the radio signal within a very narrow frequecy band and put a very narrow band filter on the front of the receiver.



        EXAMPLE: The receiver was picking up 1 micro-watt of radio signal and 1 milli-watt of noise power with a 1MHz bandwidth (so a SNR of 0.001).



        Droping the bandwith to 10Hz would result in 1 micro watt of radio signal power and 10 nano-watts of received noise power (so a SNR of 100)



        Consider a protocol like PSK31 (or similar) used by HAM radios instead of moorse code.



        PSK31 uses pure tones of relatively long duration to send 1s and 0s. The longer those tones are the more narrow the filter at the receiver can be. PSKxxx can be used to send low data rate messages across the plannet using only a few watts of power.



        Another alternative (though more complex) is using long strings of physical 1s and 0s to represent a single symbol in the protocol. This method is used by GPS for example. The GPS signal is normally about 30X lower power than the background noise, but by correlating long strings of 1024 bits the receiver is able to on average lock onto the signal.



        EXAMPLE: Define two long sequences of physical 1s and 0s for each letter of the alphabet. Each code is very different from the other codes.

        Let A be 00101010 10001010 10100101 00101010 ...

        Let B be 10100001 10100101 00010101 00010100 ...

        Let C be 01001010 01010100 00010100 00110101 ...



        The sequences may be thousands of bits long if you want. The patterns are generated by a computer automatically when the user types a letter on the keyboard.



        The physical bit sequences are sent at a much higher rate than the actual symbols. For example if you want t send one symbol per second and your sequences are 1000 bits long then you send the physical bits at 1000 bits per second.



        When receiving the signal + noise; the noise will cause the receiver to make the wrong decision on the physical 1s and 0s some percentage of the time. The receiver stores the received bit pattern and compares it to one of the codes. The receiver then selects the code which most closely matches the received pattern. Even if most of the received bits are wrong, the received code is likely to match most closely to the code sent by the transmitter rather than one of the other codes. Thus the receiver can determine what the transmitter sent even if the background noise is much higher than the received radio signal.



        Some other advantages of using long codes is that the codes inherently correct physical bit errors at the receiver. Also different transmitters can each use different code sets so they can talk at the same time (this approach is how CDMA cell phones work).






        share|improve this answer











        $endgroup$













        • $begingroup$
          Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
          $endgroup$
          – a CVn
          9 hours ago



















        1












        $begingroup$


        1. Encode whole words instead of single letters.

        2. Use Huffmann encoding based on word frequency in the specific context of space travel. So that frequent words ('the', 'yes', 'shields') have less bits than less frequent words.

        3. Use markov chains to take the context of the sentence into account as well.






        share|improve this answer









        $endgroup$





















          1












          $begingroup$

          Oops! When I was writing this, I forgot you said "5 bits per hour" and was thinking "5 bits per day"... read all instances of "day" below a "hour". I'm leaving this message temporarily in case I missed any instances.



          Here is the literal answer to your question:



          Use base64 character encoding. This allows you to represent the English characters, including numbers, which is precisely what you said you wanted, using 6 bits per character which is just 1 bit short of fitting into 1 hour's worth of transmission in your circumstance.



          And here are some enhancements to that by adding special "modes"...



          This includes both upper and lower case letters. If you are fine with restricting yourself to one case, which would still fulfill your requirements, then you would have room left to include more punctuation or other enhancements (up to 26 other enhanced transmission modes). I would recommend using some of this extra space to represent some extremely common words or short phrases that you would use very, very often. Then use a few of the character slots for other special meanings, such as "the next few bytes represent status codes" or "the following data is compressed".



          Mode 1: Table of most common words or phrases



          For the examples below, I'll assume that 2 of the characters represent different word/phrase lookup tables using 9 bits each since this allows the lookup to take exactly 3 hours to send, including the initial 6 bits (6 + 9 = 15 bits = 3 hours). This allows for 512 bits worth of lookup power times 2; that is, 1024 different shortened words or phrases.



          Using this format...



          "Hey Bob" as plain text requires 6*7=42 bits = 8 hours + 2 bits



          "communication array damaged by [reason]", assuming "communication array" and "damaged" are both in the lookup table, would take 9 hour + 3 bits + [however long it takes to send the reason]. "communication array damaged by Klingon torp" would take 24 hours - less if either "Klingon" or "torp" were send as lookup words instead of as plain text.



          Mode 2: Look-back



          This is a "repeat recent word" mode. In computer science, it has been shown that recently used data is among the most likely data to be used next, and that is what a PC memory cache is for. We can do something similar by making 1 of the character slots represent "The next 4 bits refer to a previous word; count back that many words in the most recent transmitted data."



          With this, "Klingon fleet approaches from 294 and Klingon admiral on comms saying Klingon destroyers equipped with new black hole tech" allows you to shorten 2 instances of "Klingon" to exactly 2 hours worth of data each; the first one providing "0110" (6) as the 4-bit-lookback value and the second instance being "0101" (5). In some communications this could save a lot of time if words are repeated often. Note that pronouns like "their" could have been used in some places, but that would take 7 plaintext characters (including surrounding spaces), which in this case would have taken 6 hours + 2 bits longer to send.



          Mode 3: Copy/paste, possibly with separate paste-buffers



          This would allow customized shortcuts that were not thought of before launch, a sort of copy/paste. "start copying here", then continue the message, then a "stop copying here"... then later you can send a "paste" character to repeat a long message. "comms good, thrusters good, life support good, magnetic-artificial-gravity [copy]working intermittently due to a swarm of flies in the grav-capacitor[end copy] from a meal someone left out for days", then you only have to send that long text 1 time, and each time after that you send "comms good, thrusters good, life support good, magnetic-artificial-gravity [paste]", and you do that until it changes back to "good". Also, "comms", "thrusters", "life support", "magnetic-artificial-gravity", and "good" might all be in the lookup table, meaning this entire message takes 22 hours + 1 bit to send after the first time you send it. Even better is if you make this "paste mode" be followed by a few bits for a "paste buffer number". Then you could "[copy1]comms good, [etc.], [copy2]working intermittently due to...[end-copy-2][end-copy-1] from a meal that..." Then every time you want to send an updated status, if it's the same as the one before it takes 1 hour plus a few bits.



          You can tweak the exact representation (different send modes, number of bits for each, etc.) to improve performance based on your expected communications to improve performance further.



          Mode 4: Compression



          Just what it sounds like. The following data uses a given compression algorithm.



          Multiple versions of each mode



          If you cut out lower case and still have more character slots left over after implementing all the punctuation and mode's you want, you can use left over character slots to expand modes that you already have. This is similar to what I did above with the lookup tables, I suggested using 2 of the base64 character slots that were freed up from tossing lower-case to give us 2 separate lookup tables. You could also do similar to double your look-back reach or to double your number of copy/paste buffers. You can also increase or decrease the number of bits following a mode-byte, such as having 4 bits after copy/paste to have 16 paste buffers, or only 2 extra bits to save on transmission time but allow only 4 paste buffers.



          So how efficient is this?



          Worst case scenario this requires 6 bits per character. Average case scenario you will use a few lookups, look-backs, or some compression to beat the worst case, so you require 3-5 bits per character. Best case is messages that can be relayed entirely, or nearly entirely, by lookups and look-backs, which will be often for normal day to day activities that go as expected - for such common communication, if you have a well tuned number of bits for each special mode, you should achieve better than 1 bit per character. Many times much, much better than 1 bit per character, such as with the status report example a couple paragraphs up.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed by zlib by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
            $endgroup$
            – Henning Makholm
            5 hours ago



















          0












          $begingroup$

          Huffman Encoding



          Basically, you want the same methodology we use today for writing to a .zip file. Basically what happens is we take the most common character in the file (probably 'e'), and say that it simply corresponds to the bit '1'. Then the next most common one ('a' maybe?) will be '01', and the next most common (let's say 't') will be '001'.



          So, given this system, "eat" = "101001", while "tea" = "001101".



          This is the most efficient form of encoding there is, as it gives you access to any number of characters while still using very few bits for the vast majority of the ones you're using.



          Note though: this is most effective when some letters/characters are used far more than other ones (as it is in modern English).



          Also, most .zip files will send along a "dictionary" of bit combinations and characters, so the other person can translate out of it. This can be wasteful to send every time, especially for short messages. However, if every user has a well-known dictionary that is encoded to best represent common English usages it can work.






          share|improve this answer









          $endgroup$














            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "579"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworldbuilding.stackexchange.com%2fquestions%2f144224%2fmost-bit-efficient-text-communication-method%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            10 Answers
            10






            active

            oldest

            votes








            10 Answers
            10






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            39












            $begingroup$

            The most efficient communication is probably a command set. Since you contemplated Morse code, I assume that the communication is done via a fully defined interface - both sender and receiver know what a bit sequence is supposed to mean.



            A command set is no more that giving different codes predefined meanings. With one singe bit you can define 2 commands:



            | value | meaning   |
            | 0 | light off |
            | 1 | light on |


            With 4 bits you can define 15 different commands, with 1 byte (8 bits) 255 commands, with 2 bytes 65535 commands and so on. If all you really need is to display texts to an astronaut, you have to store a bunch of ready made texts like "Activate X-ray sensors" in a database and send the corresponding message ID from Earth. For more complex messages you can store text templates in a database and then compile a message from several templates.



            An early real-world example is the list of Q-Codes, created circa 1909, by the British government as a "list of abbreviations... prepared for the use of British ships and coast stations licensed by the Postmaster General".





            If you need to communicate more than simple texts, you would separate a message into a command part and a message part. You could, for example, tell the space ship:




            Activate X-ray sensors




            By sending a signal of 2 bytes:



            | byte | value | meaning            |
            | 1 | 01 | activate appliance |
            | 2 | 08 | X-ray sensor array |


            Communication with an astronaut would be possible with a different command:



            | byte | value | meaning           |
            | 1 | 04 | write to terminal |
            | 2 | 08 | text with ID 8 |


            That would result in slightly longer commands, but the possibilities of what you can achieve with a few bytes are multiplied.





            If you have a really big database with a whole lot of different texts, it might be more efficient to terminate commands with a defined code. For this approach, the database must be sorted in a way that gives the most frequent commands the lowest ID.



            Let's define 0000 as the terminator.




            • For a very common command with the ID 6, you need to send the command's ID followed by the terminator: 0110 0000.

            • A very uncommon command with the ID 26683 would look like this: 0110 1000 0011 1011 0000.


            The advantage is that you can have commands of dynamic lengths (instead of sending a whole bunch of useless 0's to fill up the static length of a command).



            The disadvantage is that every command is longer than it could ideally be. So this approach only gets worthwhile when you need a great many commands.





            After defining your command set, the next step is to make sure that you received the correct message. Losing just a single bit can change a message of "Activate X-ray sensors" into "Destroy X-ray sensors" or similar. This is usually done with a checksum, which requires some more bits to transmit.



            Have a look at the difference between two common data transmission protocols for the internet: UDP and TCP. UDP is the most efficient in respect to transfer rate, but TCP trades some efficiency for reliability by including some overhead for error checking.






            share|improve this answer











            $endgroup$









            • 6




              $begingroup$
              +1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
              $endgroup$
              – Joe Bloggs
              15 hours ago






            • 19




              $begingroup$
              So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
              $endgroup$
              – Chronocidal
              15 hours ago






            • 8




              $begingroup$
              1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
              $endgroup$
              – genesis
              15 hours ago










            • $begingroup$
              You could take AS Interface as a reference. A small scale industrial communication bus system
              $endgroup$
              – Alexander von Wernherr
              14 hours ago






            • 2




              $begingroup$
              If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
              $endgroup$
              – FluxIX
              12 hours ago
















            39












            $begingroup$

            The most efficient communication is probably a command set. Since you contemplated Morse code, I assume that the communication is done via a fully defined interface - both sender and receiver know what a bit sequence is supposed to mean.



            A command set is no more that giving different codes predefined meanings. With one singe bit you can define 2 commands:



            | value | meaning   |
            | 0 | light off |
            | 1 | light on |


            With 4 bits you can define 15 different commands, with 1 byte (8 bits) 255 commands, with 2 bytes 65535 commands and so on. If all you really need is to display texts to an astronaut, you have to store a bunch of ready made texts like "Activate X-ray sensors" in a database and send the corresponding message ID from Earth. For more complex messages you can store text templates in a database and then compile a message from several templates.



            An early real-world example is the list of Q-Codes, created circa 1909, by the British government as a "list of abbreviations... prepared for the use of British ships and coast stations licensed by the Postmaster General".





            If you need to communicate more than simple texts, you would separate a message into a command part and a message part. You could, for example, tell the space ship:




            Activate X-ray sensors




            By sending a signal of 2 bytes:



            | byte | value | meaning            |
            | 1 | 01 | activate appliance |
            | 2 | 08 | X-ray sensor array |


            Communication with an astronaut would be possible with a different command:



            | byte | value | meaning           |
            | 1 | 04 | write to terminal |
            | 2 | 08 | text with ID 8 |


            That would result in slightly longer commands, but the possibilities of what you can achieve with a few bytes are multiplied.





            If you have a really big database with a whole lot of different texts, it might be more efficient to terminate commands with a defined code. For this approach, the database must be sorted in a way that gives the most frequent commands the lowest ID.



            Let's define 0000 as the terminator.




            • For a very common command with the ID 6, you need to send the command's ID followed by the terminator: 0110 0000.

            • A very uncommon command with the ID 26683 would look like this: 0110 1000 0011 1011 0000.


            The advantage is that you can have commands of dynamic lengths (instead of sending a whole bunch of useless 0's to fill up the static length of a command).



            The disadvantage is that every command is longer than it could ideally be. So this approach only gets worthwhile when you need a great many commands.





            After defining your command set, the next step is to make sure that you received the correct message. Losing just a single bit can change a message of "Activate X-ray sensors" into "Destroy X-ray sensors" or similar. This is usually done with a checksum, which requires some more bits to transmit.



            Have a look at the difference between two common data transmission protocols for the internet: UDP and TCP. UDP is the most efficient in respect to transfer rate, but TCP trades some efficiency for reliability by including some overhead for error checking.






            share|improve this answer











            $endgroup$









            • 6




              $begingroup$
              +1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
              $endgroup$
              – Joe Bloggs
              15 hours ago






            • 19




              $begingroup$
              So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
              $endgroup$
              – Chronocidal
              15 hours ago






            • 8




              $begingroup$
              1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
              $endgroup$
              – genesis
              15 hours ago










            • $begingroup$
              You could take AS Interface as a reference. A small scale industrial communication bus system
              $endgroup$
              – Alexander von Wernherr
              14 hours ago






            • 2




              $begingroup$
              If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
              $endgroup$
              – FluxIX
              12 hours ago














            39












            39








            39





            $begingroup$

            The most efficient communication is probably a command set. Since you contemplated Morse code, I assume that the communication is done via a fully defined interface - both sender and receiver know what a bit sequence is supposed to mean.



            A command set is no more that giving different codes predefined meanings. With one singe bit you can define 2 commands:



            | value | meaning   |
            | 0 | light off |
            | 1 | light on |


            With 4 bits you can define 15 different commands, with 1 byte (8 bits) 255 commands, with 2 bytes 65535 commands and so on. If all you really need is to display texts to an astronaut, you have to store a bunch of ready made texts like "Activate X-ray sensors" in a database and send the corresponding message ID from Earth. For more complex messages you can store text templates in a database and then compile a message from several templates.



            An early real-world example is the list of Q-Codes, created circa 1909, by the British government as a "list of abbreviations... prepared for the use of British ships and coast stations licensed by the Postmaster General".





            If you need to communicate more than simple texts, you would separate a message into a command part and a message part. You could, for example, tell the space ship:




            Activate X-ray sensors




            By sending a signal of 2 bytes:



            | byte | value | meaning            |
            | 1 | 01 | activate appliance |
            | 2 | 08 | X-ray sensor array |


            Communication with an astronaut would be possible with a different command:



            | byte | value | meaning           |
            | 1 | 04 | write to terminal |
            | 2 | 08 | text with ID 8 |


            That would result in slightly longer commands, but the possibilities of what you can achieve with a few bytes are multiplied.





            If you have a really big database with a whole lot of different texts, it might be more efficient to terminate commands with a defined code. For this approach, the database must be sorted in a way that gives the most frequent commands the lowest ID.



            Let's define 0000 as the terminator.




            • For a very common command with the ID 6, you need to send the command's ID followed by the terminator: 0110 0000.

            • A very uncommon command with the ID 26683 would look like this: 0110 1000 0011 1011 0000.


            The advantage is that you can have commands of dynamic lengths (instead of sending a whole bunch of useless 0's to fill up the static length of a command).



            The disadvantage is that every command is longer than it could ideally be. So this approach only gets worthwhile when you need a great many commands.





            After defining your command set, the next step is to make sure that you received the correct message. Losing just a single bit can change a message of "Activate X-ray sensors" into "Destroy X-ray sensors" or similar. This is usually done with a checksum, which requires some more bits to transmit.



            Have a look at the difference between two common data transmission protocols for the internet: UDP and TCP. UDP is the most efficient in respect to transfer rate, but TCP trades some efficiency for reliability by including some overhead for error checking.






            share|improve this answer











            $endgroup$



            The most efficient communication is probably a command set. Since you contemplated Morse code, I assume that the communication is done via a fully defined interface - both sender and receiver know what a bit sequence is supposed to mean.



            A command set is no more that giving different codes predefined meanings. With one singe bit you can define 2 commands:



            | value | meaning   |
            | 0 | light off |
            | 1 | light on |


            With 4 bits you can define 15 different commands, with 1 byte (8 bits) 255 commands, with 2 bytes 65535 commands and so on. If all you really need is to display texts to an astronaut, you have to store a bunch of ready made texts like "Activate X-ray sensors" in a database and send the corresponding message ID from Earth. For more complex messages you can store text templates in a database and then compile a message from several templates.



            An early real-world example is the list of Q-Codes, created circa 1909, by the British government as a "list of abbreviations... prepared for the use of British ships and coast stations licensed by the Postmaster General".





            If you need to communicate more than simple texts, you would separate a message into a command part and a message part. You could, for example, tell the space ship:




            Activate X-ray sensors




            By sending a signal of 2 bytes:



            | byte | value | meaning            |
            | 1 | 01 | activate appliance |
            | 2 | 08 | X-ray sensor array |


            Communication with an astronaut would be possible with a different command:



            | byte | value | meaning           |
            | 1 | 04 | write to terminal |
            | 2 | 08 | text with ID 8 |


            That would result in slightly longer commands, but the possibilities of what you can achieve with a few bytes are multiplied.





            If you have a really big database with a whole lot of different texts, it might be more efficient to terminate commands with a defined code. For this approach, the database must be sorted in a way that gives the most frequent commands the lowest ID.



            Let's define 0000 as the terminator.




            • For a very common command with the ID 6, you need to send the command's ID followed by the terminator: 0110 0000.

            • A very uncommon command with the ID 26683 would look like this: 0110 1000 0011 1011 0000.


            The advantage is that you can have commands of dynamic lengths (instead of sending a whole bunch of useless 0's to fill up the static length of a command).



            The disadvantage is that every command is longer than it could ideally be. So this approach only gets worthwhile when you need a great many commands.





            After defining your command set, the next step is to make sure that you received the correct message. Losing just a single bit can change a message of "Activate X-ray sensors" into "Destroy X-ray sensors" or similar. This is usually done with a checksum, which requires some more bits to transmit.



            Have a look at the difference between two common data transmission protocols for the internet: UDP and TCP. UDP is the most efficient in respect to transfer rate, but TCP trades some efficiency for reliability by including some overhead for error checking.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 9 hours ago









            Brythan

            21k74286




            21k74286










            answered 16 hours ago









            ElmyElmy

            13.4k22464




            13.4k22464








            • 6




              $begingroup$
              +1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
              $endgroup$
              – Joe Bloggs
              15 hours ago






            • 19




              $begingroup$
              So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
              $endgroup$
              – Chronocidal
              15 hours ago






            • 8




              $begingroup$
              1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
              $endgroup$
              – genesis
              15 hours ago










            • $begingroup$
              You could take AS Interface as a reference. A small scale industrial communication bus system
              $endgroup$
              – Alexander von Wernherr
              14 hours ago






            • 2




              $begingroup$
              If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
              $endgroup$
              – FluxIX
              12 hours ago














            • 6




              $begingroup$
              +1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
              $endgroup$
              – Joe Bloggs
              15 hours ago






            • 19




              $begingroup$
              So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
              $endgroup$
              – Chronocidal
              15 hours ago






            • 8




              $begingroup$
              1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
              $endgroup$
              – genesis
              15 hours ago










            • $begingroup$
              You could take AS Interface as a reference. A small scale industrial communication bus system
              $endgroup$
              – Alexander von Wernherr
              14 hours ago






            • 2




              $begingroup$
              If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
              $endgroup$
              – FluxIX
              12 hours ago








            6




            6




            $begingroup$
            +1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
            $endgroup$
            – Joe Bloggs
            15 hours ago




            $begingroup$
            +1. If you have even the slightest notion of the kinds of things you want to say you can vastly reduce the amount of information required. “Tell [astronaut] that their [medical property] is [phrase detailing concerns]” could theoretically only take a couple of bytes to transmit if the parser is cleverly designed.
            $endgroup$
            – Joe Bloggs
            15 hours ago




            19




            19




            $begingroup$
            So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
            $endgroup$
            – Chronocidal
            15 hours ago




            $begingroup$
            So, basically, Q codes, created circa 1909 by the British government, to be used by maritime ships (both civilian and military) for precisely the reason in the question. As a nice bonus, language is irrelevant, so long as each party has a lookup table in their own language.
            $endgroup$
            – Chronocidal
            15 hours ago




            8




            8




            $begingroup$
            1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
            $endgroup$
            – genesis
            15 hours ago




            $begingroup$
            1 byte -> 15 commands, 2 bytes -> 255 commands...what? Why not 256 and 16384?
            $endgroup$
            – genesis
            15 hours ago












            $begingroup$
            You could take AS Interface as a reference. A small scale industrial communication bus system
            $endgroup$
            – Alexander von Wernherr
            14 hours ago




            $begingroup$
            You could take AS Interface as a reference. A small scale industrial communication bus system
            $endgroup$
            – Alexander von Wernherr
            14 hours ago




            2




            2




            $begingroup$
            If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
            $endgroup$
            – FluxIX
            12 hours ago




            $begingroup$
            If using the terminated ID approach Huffman encoding could aid in ensuring the most common commands are the more efficient. Also, you might want to add to the protocol a mechanism for including parameters with the command as a separate parameter command could easily take more space than simply allowing each command to have an arbitrary number of parameters.
            $endgroup$
            – FluxIX
            12 hours ago











            13












            $begingroup$

            According to Schneier the entropy of English text is below 1.6 bit per letter. Given a difficult constraint such as yours I would expect people to come up with compression algorithms getting close to that.



            If you don't need the full power of English you might get much better compression if you can pre-define a small set of words that would be sufficient. Something similar in principle to https://xkcd.com/1133/



            I think you need to answer two important questions:




            1. Is the system pre-defined, i.e. can there be word-lists?

            2. Are characters/words sent individually or can you apply compression to a large amount of data and then send it in bulk?


            If you want something that is simple, sciency and requires no setup, go with Huffman-coding individual letters based on frequency in English. ;)






            share|improve this answer











            $endgroup$









            • 1




              $begingroup$
              as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
              $endgroup$
              – Andrey
              6 hours ago










            • $begingroup$
              @Andrey I'm not sure I would call rocket science a "simple idea".
              $endgroup$
              – TheHansinator
              3 hours ago






            • 1




              $begingroup$
              @Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
              $endgroup$
              – TheHansinator
              3 hours ago
















            13












            $begingroup$

            According to Schneier the entropy of English text is below 1.6 bit per letter. Given a difficult constraint such as yours I would expect people to come up with compression algorithms getting close to that.



            If you don't need the full power of English you might get much better compression if you can pre-define a small set of words that would be sufficient. Something similar in principle to https://xkcd.com/1133/



            I think you need to answer two important questions:




            1. Is the system pre-defined, i.e. can there be word-lists?

            2. Are characters/words sent individually or can you apply compression to a large amount of data and then send it in bulk?


            If you want something that is simple, sciency and requires no setup, go with Huffman-coding individual letters based on frequency in English. ;)






            share|improve this answer











            $endgroup$









            • 1




              $begingroup$
              as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
              $endgroup$
              – Andrey
              6 hours ago










            • $begingroup$
              @Andrey I'm not sure I would call rocket science a "simple idea".
              $endgroup$
              – TheHansinator
              3 hours ago






            • 1




              $begingroup$
              @Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
              $endgroup$
              – TheHansinator
              3 hours ago














            13












            13








            13





            $begingroup$

            According to Schneier the entropy of English text is below 1.6 bit per letter. Given a difficult constraint such as yours I would expect people to come up with compression algorithms getting close to that.



            If you don't need the full power of English you might get much better compression if you can pre-define a small set of words that would be sufficient. Something similar in principle to https://xkcd.com/1133/



            I think you need to answer two important questions:




            1. Is the system pre-defined, i.e. can there be word-lists?

            2. Are characters/words sent individually or can you apply compression to a large amount of data and then send it in bulk?


            If you want something that is simple, sciency and requires no setup, go with Huffman-coding individual letters based on frequency in English. ;)






            share|improve this answer











            $endgroup$



            According to Schneier the entropy of English text is below 1.6 bit per letter. Given a difficult constraint such as yours I would expect people to come up with compression algorithms getting close to that.



            If you don't need the full power of English you might get much better compression if you can pre-define a small set of words that would be sufficient. Something similar in principle to https://xkcd.com/1133/



            I think you need to answer two important questions:




            1. Is the system pre-defined, i.e. can there be word-lists?

            2. Are characters/words sent individually or can you apply compression to a large amount of data and then send it in bulk?


            If you want something that is simple, sciency and requires no setup, go with Huffman-coding individual letters based on frequency in English. ;)







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 16 hours ago

























            answered 16 hours ago









            genesisgenesis

            59217




            59217








            • 1




              $begingroup$
              as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
              $endgroup$
              – Andrey
              6 hours ago










            • $begingroup$
              @Andrey I'm not sure I would call rocket science a "simple idea".
              $endgroup$
              – TheHansinator
              3 hours ago






            • 1




              $begingroup$
              @Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
              $endgroup$
              – TheHansinator
              3 hours ago














            • 1




              $begingroup$
              as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
              $endgroup$
              – Andrey
              6 hours ago










            • $begingroup$
              @Andrey I'm not sure I would call rocket science a "simple idea".
              $endgroup$
              – TheHansinator
              3 hours ago






            • 1




              $begingroup$
              @Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
              $endgroup$
              – TheHansinator
              3 hours ago








            1




            1




            $begingroup$
            as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
            $endgroup$
            – Andrey
            6 hours ago




            $begingroup$
            as you can see with XKCD it gets so wordy to convoy a simple idea. May still work better than ASCII
            $endgroup$
            – Andrey
            6 hours ago












            $begingroup$
            @Andrey I'm not sure I would call rocket science a "simple idea".
            $endgroup$
            – TheHansinator
            3 hours ago




            $begingroup$
            @Andrey I'm not sure I would call rocket science a "simple idea".
            $endgroup$
            – TheHansinator
            3 hours ago




            1




            1




            $begingroup$
            @Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
            $endgroup$
            – TheHansinator
            3 hours ago




            $begingroup$
            @Andrey Though, in all seriousness, the words chosen for the compressed language would probably be chosen specifically for the domain - e.g. the language chosen for a spaceship probably would include words like "rocket".
            $endgroup$
            – TheHansinator
            3 hours ago











            6












            $begingroup$

            You might look at digital modes for amateur radio here. Some of those modes use what's called "varicode" -- where different characters have different symbol lengths (Morse code is a varicode system -- more commonly used letters are shorter in terms of transmission time). When sending English text, a varicode will minimize the number of bits required for a sufficiently large sample (which reasonably ought to include a large number of messages). If "text speak" is used commonly, it might make sense to design the varicode used around letter frequencies in that particular text format.



            If longer messages are common, some form of compression would make sense -- text typically compresses will with common compression algorithms, but the compression headers make this inefficient for very small blocks of data (text or otherwise).






            share|improve this answer









            $endgroup$


















              6












              $begingroup$

              You might look at digital modes for amateur radio here. Some of those modes use what's called "varicode" -- where different characters have different symbol lengths (Morse code is a varicode system -- more commonly used letters are shorter in terms of transmission time). When sending English text, a varicode will minimize the number of bits required for a sufficiently large sample (which reasonably ought to include a large number of messages). If "text speak" is used commonly, it might make sense to design the varicode used around letter frequencies in that particular text format.



              If longer messages are common, some form of compression would make sense -- text typically compresses will with common compression algorithms, but the compression headers make this inefficient for very small blocks of data (text or otherwise).






              share|improve this answer









              $endgroup$
















                6












                6








                6





                $begingroup$

                You might look at digital modes for amateur radio here. Some of those modes use what's called "varicode" -- where different characters have different symbol lengths (Morse code is a varicode system -- more commonly used letters are shorter in terms of transmission time). When sending English text, a varicode will minimize the number of bits required for a sufficiently large sample (which reasonably ought to include a large number of messages). If "text speak" is used commonly, it might make sense to design the varicode used around letter frequencies in that particular text format.



                If longer messages are common, some form of compression would make sense -- text typically compresses will with common compression algorithms, but the compression headers make this inefficient for very small blocks of data (text or otherwise).






                share|improve this answer









                $endgroup$



                You might look at digital modes for amateur radio here. Some of those modes use what's called "varicode" -- where different characters have different symbol lengths (Morse code is a varicode system -- more commonly used letters are shorter in terms of transmission time). When sending English text, a varicode will minimize the number of bits required for a sufficiently large sample (which reasonably ought to include a large number of messages). If "text speak" is used commonly, it might make sense to design the varicode used around letter frequencies in that particular text format.



                If longer messages are common, some form of compression would make sense -- text typically compresses will with common compression algorithms, but the compression headers make this inefficient for very small blocks of data (text or otherwise).







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 15 hours ago









                Zeiss IkonZeiss Ikon

                2,584117




                2,584117























                    4












                    $begingroup$

                    Textspeak



                    SMS messages originally were 160 characters so textspeak evolved to reduce everything down to the most compact form through abbreviations, acronyms and emoticons.



                    Sounds like a good reason to send teenagers into space....






                    share|improve this answer











                    $endgroup$













                    • $begingroup$
                      I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
                      $endgroup$
                      – Aaron
                      8 hours ago










                    • $begingroup$
                      Was previously beeper speak: 143 133 43 43 5318008
                      $endgroup$
                      – RIanGillis
                      6 hours ago
















                    4












                    $begingroup$

                    Textspeak



                    SMS messages originally were 160 characters so textspeak evolved to reduce everything down to the most compact form through abbreviations, acronyms and emoticons.



                    Sounds like a good reason to send teenagers into space....






                    share|improve this answer











                    $endgroup$













                    • $begingroup$
                      I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
                      $endgroup$
                      – Aaron
                      8 hours ago










                    • $begingroup$
                      Was previously beeper speak: 143 133 43 43 5318008
                      $endgroup$
                      – RIanGillis
                      6 hours ago














                    4












                    4








                    4





                    $begingroup$

                    Textspeak



                    SMS messages originally were 160 characters so textspeak evolved to reduce everything down to the most compact form through abbreviations, acronyms and emoticons.



                    Sounds like a good reason to send teenagers into space....






                    share|improve this answer











                    $endgroup$



                    Textspeak



                    SMS messages originally were 160 characters so textspeak evolved to reduce everything down to the most compact form through abbreviations, acronyms and emoticons.



                    Sounds like a good reason to send teenagers into space....







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited 13 hours ago









                    Glorfindel

                    4151614




                    4151614










                    answered 16 hours ago









                    ThorneThorne

                    18.5k42657




                    18.5k42657












                    • $begingroup$
                      I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
                      $endgroup$
                      – Aaron
                      8 hours ago










                    • $begingroup$
                      Was previously beeper speak: 143 133 43 43 5318008
                      $endgroup$
                      – RIanGillis
                      6 hours ago


















                    • $begingroup$
                      I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
                      $endgroup$
                      – Aaron
                      8 hours ago










                    • $begingroup$
                      Was previously beeper speak: 143 133 43 43 5318008
                      $endgroup$
                      – RIanGillis
                      6 hours ago
















                    $begingroup$
                    I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
                    $endgroup$
                    – Aaron
                    8 hours ago




                    $begingroup$
                    I'm pretty sure that style of writing predates common SMS usage. I recall half the people on the internet talking that way in the late 90s, a time when most people did not have internet but even fewer people had the ability to send text messages.
                    $endgroup$
                    – Aaron
                    8 hours ago












                    $begingroup$
                    Was previously beeper speak: 143 133 43 43 5318008
                    $endgroup$
                    – RIanGillis
                    6 hours ago




                    $begingroup$
                    Was previously beeper speak: 143 133 43 43 5318008
                    $endgroup$
                    – RIanGillis
                    6 hours ago











                    2












                    $begingroup$

                    Building on other answers



                    In addition to the different encoding and compression methods, one thing to look into is shorthand techniques that allow you to drop letters while still being able to interpret the message. Some examples:




                    • it, to -> t

                    • is -> s

                    • have -> hv

                    • cat -> ct

                    • are -> r


                    Example sentence: hw r u?



                    An alternative approach



                    Encode your information in time delays



                    Presumably there is some reason that you can't speed up the data transmission, but perhaps you can slow it down. At 5 bits per hour, that's 12 minutes between each bit. Instead of sending each bit at regular interval, you can delay transmission of bits and use the delay time as a means of conveying information.



                    So let's say you expect a minimum of 12 minutes between each bit, you can encode the data as follows (time is in mm:ss format):




                    • 12:00 = 0

                    • 12:05 = 1

                    • 12:10 = 2

                    • 12:15 = 3

                    • etc


                    The more data you encode, the fewer bits per day you'll be able to transmit, so there will be some optimal balance you'll have to figure out based on the minimum delay interval you consider acceptable. Then you can perhaps use the bits themselves as an error checking mechanism, or to still transmit data.






                    share|improve this answer









                    $endgroup$


















                      2












                      $begingroup$

                      Building on other answers



                      In addition to the different encoding and compression methods, one thing to look into is shorthand techniques that allow you to drop letters while still being able to interpret the message. Some examples:




                      • it, to -> t

                      • is -> s

                      • have -> hv

                      • cat -> ct

                      • are -> r


                      Example sentence: hw r u?



                      An alternative approach



                      Encode your information in time delays



                      Presumably there is some reason that you can't speed up the data transmission, but perhaps you can slow it down. At 5 bits per hour, that's 12 minutes between each bit. Instead of sending each bit at regular interval, you can delay transmission of bits and use the delay time as a means of conveying information.



                      So let's say you expect a minimum of 12 minutes between each bit, you can encode the data as follows (time is in mm:ss format):




                      • 12:00 = 0

                      • 12:05 = 1

                      • 12:10 = 2

                      • 12:15 = 3

                      • etc


                      The more data you encode, the fewer bits per day you'll be able to transmit, so there will be some optimal balance you'll have to figure out based on the minimum delay interval you consider acceptable. Then you can perhaps use the bits themselves as an error checking mechanism, or to still transmit data.






                      share|improve this answer









                      $endgroup$
















                        2












                        2








                        2





                        $begingroup$

                        Building on other answers



                        In addition to the different encoding and compression methods, one thing to look into is shorthand techniques that allow you to drop letters while still being able to interpret the message. Some examples:




                        • it, to -> t

                        • is -> s

                        • have -> hv

                        • cat -> ct

                        • are -> r


                        Example sentence: hw r u?



                        An alternative approach



                        Encode your information in time delays



                        Presumably there is some reason that you can't speed up the data transmission, but perhaps you can slow it down. At 5 bits per hour, that's 12 minutes between each bit. Instead of sending each bit at regular interval, you can delay transmission of bits and use the delay time as a means of conveying information.



                        So let's say you expect a minimum of 12 minutes between each bit, you can encode the data as follows (time is in mm:ss format):




                        • 12:00 = 0

                        • 12:05 = 1

                        • 12:10 = 2

                        • 12:15 = 3

                        • etc


                        The more data you encode, the fewer bits per day you'll be able to transmit, so there will be some optimal balance you'll have to figure out based on the minimum delay interval you consider acceptable. Then you can perhaps use the bits themselves as an error checking mechanism, or to still transmit data.






                        share|improve this answer









                        $endgroup$



                        Building on other answers



                        In addition to the different encoding and compression methods, one thing to look into is shorthand techniques that allow you to drop letters while still being able to interpret the message. Some examples:




                        • it, to -> t

                        • is -> s

                        • have -> hv

                        • cat -> ct

                        • are -> r


                        Example sentence: hw r u?



                        An alternative approach



                        Encode your information in time delays



                        Presumably there is some reason that you can't speed up the data transmission, but perhaps you can slow it down. At 5 bits per hour, that's 12 minutes between each bit. Instead of sending each bit at regular interval, you can delay transmission of bits and use the delay time as a means of conveying information.



                        So let's say you expect a minimum of 12 minutes between each bit, you can encode the data as follows (time is in mm:ss format):




                        • 12:00 = 0

                        • 12:05 = 1

                        • 12:10 = 2

                        • 12:15 = 3

                        • etc


                        The more data you encode, the fewer bits per day you'll be able to transmit, so there will be some optimal balance you'll have to figure out based on the minimum delay interval you consider acceptable. Then you can perhaps use the bits themselves as an error checking mechanism, or to still transmit data.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered 9 hours ago









                        anjamaanjama

                        911




                        911























                            2












                            $begingroup$

                            Not Morse code



                            From Wikipedia:




                            International Morse code is composed of five elements:[1]




                            1. short mark, dot or "dit" (▄▄▄▄): "dot duration" is one time unit long

                            2. longer mark, dash or "dah" (▄▄▄▄▄▄): three time units long

                            3. inter-element gap between the dots and dashes within a character: one dot duration or one unit long

                            4. short gap (between letters): three time units long

                            5. medium gap (between words): seven time units long




                            If we use one bit to store one unit of information, it takes four bits to transmit even the shortest letter ('e') and its subsequent gap. The next shortest are 'i' and 't' at six bits. Then 'a', 'n', and 's' at eight. The longest character in the Morse alphabet is 0, which requires five dashes or twenty-two units/bits. And that only supports the thirty-six character latin alphanumeric alphabet.



                            Morse is designed around humans. Humans do better with indeterminate length than fixed length, as we don't have good timing ability (we can't tell a five unit pause from a four unit pause consistently). But if these messages are being transmitted computer to computer, computers have great ability at timing. We can use superior fixed length formats. Heck, even with humans, twelve minute long units means that it is easy to track whether you're getting a pause or a dot (a zero or a one).



                            Even worse, if you are transmitting Morse over bits. Because (extended) ASCII's eight bits is more efficient unless the message is composed entirely of 'eitans'.



                            Bits



                            Meanwhile, if we transmit ASCII, we could transmit a 0 with eight bits. If we break things into nybbles, we can transmit one nybble with a checksum bit every hour. So two hours to transmit one character with some error detection included. Or ninety-six minutes without the checksums.



                            If we instead use ten bits (two hours), we can do something like Lempel-Ziv. So the first 256 characters are the extended ASCII set. The remaining 768 symbols actually represent multiple characters. So common sequences (e.g. "the ", "ing", and "tion") would have their own ten-bit representation, e.g. 0100000000. This allows the full flexibility of ASCII while also producing a shorter message on average.



                            The Lempel-Ziv algorithm builds the dictionary from the message itself. We can do better by agreeing on a dictionary beforehand. You can also use this to integrate the error correction and the dictionary, which improves your effective speed.



                            Numbers are generally going to be better sent as bits than as characters. I.e. instead of sending ASCII 3840, just send 111100000000. That's only twelve bits, hardly more than a single character.






                            share|improve this answer









                            $endgroup$


















                              2












                              $begingroup$

                              Not Morse code



                              From Wikipedia:




                              International Morse code is composed of five elements:[1]




                              1. short mark, dot or "dit" (▄▄▄▄): "dot duration" is one time unit long

                              2. longer mark, dash or "dah" (▄▄▄▄▄▄): three time units long

                              3. inter-element gap between the dots and dashes within a character: one dot duration or one unit long

                              4. short gap (between letters): three time units long

                              5. medium gap (between words): seven time units long




                              If we use one bit to store one unit of information, it takes four bits to transmit even the shortest letter ('e') and its subsequent gap. The next shortest are 'i' and 't' at six bits. Then 'a', 'n', and 's' at eight. The longest character in the Morse alphabet is 0, which requires five dashes or twenty-two units/bits. And that only supports the thirty-six character latin alphanumeric alphabet.



                              Morse is designed around humans. Humans do better with indeterminate length than fixed length, as we don't have good timing ability (we can't tell a five unit pause from a four unit pause consistently). But if these messages are being transmitted computer to computer, computers have great ability at timing. We can use superior fixed length formats. Heck, even with humans, twelve minute long units means that it is easy to track whether you're getting a pause or a dot (a zero or a one).



                              Even worse, if you are transmitting Morse over bits. Because (extended) ASCII's eight bits is more efficient unless the message is composed entirely of 'eitans'.



                              Bits



                              Meanwhile, if we transmit ASCII, we could transmit a 0 with eight bits. If we break things into nybbles, we can transmit one nybble with a checksum bit every hour. So two hours to transmit one character with some error detection included. Or ninety-six minutes without the checksums.



                              If we instead use ten bits (two hours), we can do something like Lempel-Ziv. So the first 256 characters are the extended ASCII set. The remaining 768 symbols actually represent multiple characters. So common sequences (e.g. "the ", "ing", and "tion") would have their own ten-bit representation, e.g. 0100000000. This allows the full flexibility of ASCII while also producing a shorter message on average.



                              The Lempel-Ziv algorithm builds the dictionary from the message itself. We can do better by agreeing on a dictionary beforehand. You can also use this to integrate the error correction and the dictionary, which improves your effective speed.



                              Numbers are generally going to be better sent as bits than as characters. I.e. instead of sending ASCII 3840, just send 111100000000. That's only twelve bits, hardly more than a single character.






                              share|improve this answer









                              $endgroup$
















                                2












                                2








                                2





                                $begingroup$

                                Not Morse code



                                From Wikipedia:




                                International Morse code is composed of five elements:[1]




                                1. short mark, dot or "dit" (▄▄▄▄): "dot duration" is one time unit long

                                2. longer mark, dash or "dah" (▄▄▄▄▄▄): three time units long

                                3. inter-element gap between the dots and dashes within a character: one dot duration or one unit long

                                4. short gap (between letters): three time units long

                                5. medium gap (between words): seven time units long




                                If we use one bit to store one unit of information, it takes four bits to transmit even the shortest letter ('e') and its subsequent gap. The next shortest are 'i' and 't' at six bits. Then 'a', 'n', and 's' at eight. The longest character in the Morse alphabet is 0, which requires five dashes or twenty-two units/bits. And that only supports the thirty-six character latin alphanumeric alphabet.



                                Morse is designed around humans. Humans do better with indeterminate length than fixed length, as we don't have good timing ability (we can't tell a five unit pause from a four unit pause consistently). But if these messages are being transmitted computer to computer, computers have great ability at timing. We can use superior fixed length formats. Heck, even with humans, twelve minute long units means that it is easy to track whether you're getting a pause or a dot (a zero or a one).



                                Even worse, if you are transmitting Morse over bits. Because (extended) ASCII's eight bits is more efficient unless the message is composed entirely of 'eitans'.



                                Bits



                                Meanwhile, if we transmit ASCII, we could transmit a 0 with eight bits. If we break things into nybbles, we can transmit one nybble with a checksum bit every hour. So two hours to transmit one character with some error detection included. Or ninety-six minutes without the checksums.



                                If we instead use ten bits (two hours), we can do something like Lempel-Ziv. So the first 256 characters are the extended ASCII set. The remaining 768 symbols actually represent multiple characters. So common sequences (e.g. "the ", "ing", and "tion") would have their own ten-bit representation, e.g. 0100000000. This allows the full flexibility of ASCII while also producing a shorter message on average.



                                The Lempel-Ziv algorithm builds the dictionary from the message itself. We can do better by agreeing on a dictionary beforehand. You can also use this to integrate the error correction and the dictionary, which improves your effective speed.



                                Numbers are generally going to be better sent as bits than as characters. I.e. instead of sending ASCII 3840, just send 111100000000. That's only twelve bits, hardly more than a single character.






                                share|improve this answer









                                $endgroup$



                                Not Morse code



                                From Wikipedia:




                                International Morse code is composed of five elements:[1]




                                1. short mark, dot or "dit" (▄▄▄▄): "dot duration" is one time unit long

                                2. longer mark, dash or "dah" (▄▄▄▄▄▄): three time units long

                                3. inter-element gap between the dots and dashes within a character: one dot duration or one unit long

                                4. short gap (between letters): three time units long

                                5. medium gap (between words): seven time units long




                                If we use one bit to store one unit of information, it takes four bits to transmit even the shortest letter ('e') and its subsequent gap. The next shortest are 'i' and 't' at six bits. Then 'a', 'n', and 's' at eight. The longest character in the Morse alphabet is 0, which requires five dashes or twenty-two units/bits. And that only supports the thirty-six character latin alphanumeric alphabet.



                                Morse is designed around humans. Humans do better with indeterminate length than fixed length, as we don't have good timing ability (we can't tell a five unit pause from a four unit pause consistently). But if these messages are being transmitted computer to computer, computers have great ability at timing. We can use superior fixed length formats. Heck, even with humans, twelve minute long units means that it is easy to track whether you're getting a pause or a dot (a zero or a one).



                                Even worse, if you are transmitting Morse over bits. Because (extended) ASCII's eight bits is more efficient unless the message is composed entirely of 'eitans'.



                                Bits



                                Meanwhile, if we transmit ASCII, we could transmit a 0 with eight bits. If we break things into nybbles, we can transmit one nybble with a checksum bit every hour. So two hours to transmit one character with some error detection included. Or ninety-six minutes without the checksums.



                                If we instead use ten bits (two hours), we can do something like Lempel-Ziv. So the first 256 characters are the extended ASCII set. The remaining 768 symbols actually represent multiple characters. So common sequences (e.g. "the ", "ing", and "tion") would have their own ten-bit representation, e.g. 0100000000. This allows the full flexibility of ASCII while also producing a shorter message on average.



                                The Lempel-Ziv algorithm builds the dictionary from the message itself. We can do better by agreeing on a dictionary beforehand. You can also use this to integrate the error correction and the dictionary, which improves your effective speed.



                                Numbers are generally going to be better sent as bits than as characters. I.e. instead of sending ASCII 3840, just send 111100000000. That's only twelve bits, hardly more than a single character.







                                share|improve this answer












                                share|improve this answer



                                share|improve this answer










                                answered 8 hours ago









                                BrythanBrythan

                                21k74286




                                21k74286























                                    1












                                    $begingroup$

                                    A receiver will pick up the raido signal plus background noise (most notably cosmic background radiation). Generally the received noise power is greater for greater receiver bandwidth. So to get a good signal to noise ratio one can transmit the radio signal within a very narrow frequecy band and put a very narrow band filter on the front of the receiver.



                                    EXAMPLE: The receiver was picking up 1 micro-watt of radio signal and 1 milli-watt of noise power with a 1MHz bandwidth (so a SNR of 0.001).



                                    Droping the bandwith to 10Hz would result in 1 micro watt of radio signal power and 10 nano-watts of received noise power (so a SNR of 100)



                                    Consider a protocol like PSK31 (or similar) used by HAM radios instead of moorse code.



                                    PSK31 uses pure tones of relatively long duration to send 1s and 0s. The longer those tones are the more narrow the filter at the receiver can be. PSKxxx can be used to send low data rate messages across the plannet using only a few watts of power.



                                    Another alternative (though more complex) is using long strings of physical 1s and 0s to represent a single symbol in the protocol. This method is used by GPS for example. The GPS signal is normally about 30X lower power than the background noise, but by correlating long strings of 1024 bits the receiver is able to on average lock onto the signal.



                                    EXAMPLE: Define two long sequences of physical 1s and 0s for each letter of the alphabet. Each code is very different from the other codes.

                                    Let A be 00101010 10001010 10100101 00101010 ...

                                    Let B be 10100001 10100101 00010101 00010100 ...

                                    Let C be 01001010 01010100 00010100 00110101 ...



                                    The sequences may be thousands of bits long if you want. The patterns are generated by a computer automatically when the user types a letter on the keyboard.



                                    The physical bit sequences are sent at a much higher rate than the actual symbols. For example if you want t send one symbol per second and your sequences are 1000 bits long then you send the physical bits at 1000 bits per second.



                                    When receiving the signal + noise; the noise will cause the receiver to make the wrong decision on the physical 1s and 0s some percentage of the time. The receiver stores the received bit pattern and compares it to one of the codes. The receiver then selects the code which most closely matches the received pattern. Even if most of the received bits are wrong, the received code is likely to match most closely to the code sent by the transmitter rather than one of the other codes. Thus the receiver can determine what the transmitter sent even if the background noise is much higher than the received radio signal.



                                    Some other advantages of using long codes is that the codes inherently correct physical bit errors at the receiver. Also different transmitters can each use different code sets so they can talk at the same time (this approach is how CDMA cell phones work).






                                    share|improve this answer











                                    $endgroup$













                                    • $begingroup$
                                      Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
                                      $endgroup$
                                      – a CVn
                                      9 hours ago
















                                    1












                                    $begingroup$

                                    A receiver will pick up the raido signal plus background noise (most notably cosmic background radiation). Generally the received noise power is greater for greater receiver bandwidth. So to get a good signal to noise ratio one can transmit the radio signal within a very narrow frequecy band and put a very narrow band filter on the front of the receiver.



                                    EXAMPLE: The receiver was picking up 1 micro-watt of radio signal and 1 milli-watt of noise power with a 1MHz bandwidth (so a SNR of 0.001).



                                    Droping the bandwith to 10Hz would result in 1 micro watt of radio signal power and 10 nano-watts of received noise power (so a SNR of 100)



                                    Consider a protocol like PSK31 (or similar) used by HAM radios instead of moorse code.



                                    PSK31 uses pure tones of relatively long duration to send 1s and 0s. The longer those tones are the more narrow the filter at the receiver can be. PSKxxx can be used to send low data rate messages across the plannet using only a few watts of power.



                                    Another alternative (though more complex) is using long strings of physical 1s and 0s to represent a single symbol in the protocol. This method is used by GPS for example. The GPS signal is normally about 30X lower power than the background noise, but by correlating long strings of 1024 bits the receiver is able to on average lock onto the signal.



                                    EXAMPLE: Define two long sequences of physical 1s and 0s for each letter of the alphabet. Each code is very different from the other codes.

                                    Let A be 00101010 10001010 10100101 00101010 ...

                                    Let B be 10100001 10100101 00010101 00010100 ...

                                    Let C be 01001010 01010100 00010100 00110101 ...



                                    The sequences may be thousands of bits long if you want. The patterns are generated by a computer automatically when the user types a letter on the keyboard.



                                    The physical bit sequences are sent at a much higher rate than the actual symbols. For example if you want t send one symbol per second and your sequences are 1000 bits long then you send the physical bits at 1000 bits per second.



                                    When receiving the signal + noise; the noise will cause the receiver to make the wrong decision on the physical 1s and 0s some percentage of the time. The receiver stores the received bit pattern and compares it to one of the codes. The receiver then selects the code which most closely matches the received pattern. Even if most of the received bits are wrong, the received code is likely to match most closely to the code sent by the transmitter rather than one of the other codes. Thus the receiver can determine what the transmitter sent even if the background noise is much higher than the received radio signal.



                                    Some other advantages of using long codes is that the codes inherently correct physical bit errors at the receiver. Also different transmitters can each use different code sets so they can talk at the same time (this approach is how CDMA cell phones work).






                                    share|improve this answer











                                    $endgroup$













                                    • $begingroup$
                                      Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
                                      $endgroup$
                                      – a CVn
                                      9 hours ago














                                    1












                                    1








                                    1





                                    $begingroup$

                                    A receiver will pick up the raido signal plus background noise (most notably cosmic background radiation). Generally the received noise power is greater for greater receiver bandwidth. So to get a good signal to noise ratio one can transmit the radio signal within a very narrow frequecy band and put a very narrow band filter on the front of the receiver.



                                    EXAMPLE: The receiver was picking up 1 micro-watt of radio signal and 1 milli-watt of noise power with a 1MHz bandwidth (so a SNR of 0.001).



                                    Droping the bandwith to 10Hz would result in 1 micro watt of radio signal power and 10 nano-watts of received noise power (so a SNR of 100)



                                    Consider a protocol like PSK31 (or similar) used by HAM radios instead of moorse code.



                                    PSK31 uses pure tones of relatively long duration to send 1s and 0s. The longer those tones are the more narrow the filter at the receiver can be. PSKxxx can be used to send low data rate messages across the plannet using only a few watts of power.



                                    Another alternative (though more complex) is using long strings of physical 1s and 0s to represent a single symbol in the protocol. This method is used by GPS for example. The GPS signal is normally about 30X lower power than the background noise, but by correlating long strings of 1024 bits the receiver is able to on average lock onto the signal.



                                    EXAMPLE: Define two long sequences of physical 1s and 0s for each letter of the alphabet. Each code is very different from the other codes.

                                    Let A be 00101010 10001010 10100101 00101010 ...

                                    Let B be 10100001 10100101 00010101 00010100 ...

                                    Let C be 01001010 01010100 00010100 00110101 ...



                                    The sequences may be thousands of bits long if you want. The patterns are generated by a computer automatically when the user types a letter on the keyboard.



                                    The physical bit sequences are sent at a much higher rate than the actual symbols. For example if you want t send one symbol per second and your sequences are 1000 bits long then you send the physical bits at 1000 bits per second.



                                    When receiving the signal + noise; the noise will cause the receiver to make the wrong decision on the physical 1s and 0s some percentage of the time. The receiver stores the received bit pattern and compares it to one of the codes. The receiver then selects the code which most closely matches the received pattern. Even if most of the received bits are wrong, the received code is likely to match most closely to the code sent by the transmitter rather than one of the other codes. Thus the receiver can determine what the transmitter sent even if the background noise is much higher than the received radio signal.



                                    Some other advantages of using long codes is that the codes inherently correct physical bit errors at the receiver. Also different transmitters can each use different code sets so they can talk at the same time (this approach is how CDMA cell phones work).






                                    share|improve this answer











                                    $endgroup$



                                    A receiver will pick up the raido signal plus background noise (most notably cosmic background radiation). Generally the received noise power is greater for greater receiver bandwidth. So to get a good signal to noise ratio one can transmit the radio signal within a very narrow frequecy band and put a very narrow band filter on the front of the receiver.



                                    EXAMPLE: The receiver was picking up 1 micro-watt of radio signal and 1 milli-watt of noise power with a 1MHz bandwidth (so a SNR of 0.001).



                                    Droping the bandwith to 10Hz would result in 1 micro watt of radio signal power and 10 nano-watts of received noise power (so a SNR of 100)



                                    Consider a protocol like PSK31 (or similar) used by HAM radios instead of moorse code.



                                    PSK31 uses pure tones of relatively long duration to send 1s and 0s. The longer those tones are the more narrow the filter at the receiver can be. PSKxxx can be used to send low data rate messages across the plannet using only a few watts of power.



                                    Another alternative (though more complex) is using long strings of physical 1s and 0s to represent a single symbol in the protocol. This method is used by GPS for example. The GPS signal is normally about 30X lower power than the background noise, but by correlating long strings of 1024 bits the receiver is able to on average lock onto the signal.



                                    EXAMPLE: Define two long sequences of physical 1s and 0s for each letter of the alphabet. Each code is very different from the other codes.

                                    Let A be 00101010 10001010 10100101 00101010 ...

                                    Let B be 10100001 10100101 00010101 00010100 ...

                                    Let C be 01001010 01010100 00010100 00110101 ...



                                    The sequences may be thousands of bits long if you want. The patterns are generated by a computer automatically when the user types a letter on the keyboard.



                                    The physical bit sequences are sent at a much higher rate than the actual symbols. For example if you want t send one symbol per second and your sequences are 1000 bits long then you send the physical bits at 1000 bits per second.



                                    When receiving the signal + noise; the noise will cause the receiver to make the wrong decision on the physical 1s and 0s some percentage of the time. The receiver stores the received bit pattern and compares it to one of the codes. The receiver then selects the code which most closely matches the received pattern. Even if most of the received bits are wrong, the received code is likely to match most closely to the code sent by the transmitter rather than one of the other codes. Thus the receiver can determine what the transmitter sent even if the background noise is much higher than the received radio signal.



                                    Some other advantages of using long codes is that the codes inherently correct physical bit errors at the receiver. Also different transmitters can each use different code sets so they can talk at the same time (this approach is how CDMA cell phones work).







                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited 9 hours ago

























                                    answered 10 hours ago









                                    user4574user4574

                                    63636




                                    63636












                                    • $begingroup$
                                      Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
                                      $endgroup$
                                      – a CVn
                                      9 hours ago


















                                    • $begingroup$
                                      Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
                                      $endgroup$
                                      – a CVn
                                      9 hours ago
















                                    $begingroup$
                                    Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
                                    $endgroup$
                                    – a CVn
                                    9 hours ago




                                    $begingroup$
                                    Yeah, something similar to PSK31, or maybe JT65, FT8 or some such for their low S/N characteristics, was my first thought as well. One benefit of it, and many similar encoding schemes (I'm not sure I really want to call it a modulation per se, though one could make an argument that it's a baseband modulation) is that they use variable-length encodings. That requires some kind of synchronization, but allows the more commonly used code points to be encoded more compactly and thus transmitted more quickly.
                                    $endgroup$
                                    – a CVn
                                    9 hours ago











                                    1












                                    $begingroup$


                                    1. Encode whole words instead of single letters.

                                    2. Use Huffmann encoding based on word frequency in the specific context of space travel. So that frequent words ('the', 'yes', 'shields') have less bits than less frequent words.

                                    3. Use markov chains to take the context of the sentence into account as well.






                                    share|improve this answer









                                    $endgroup$


















                                      1












                                      $begingroup$


                                      1. Encode whole words instead of single letters.

                                      2. Use Huffmann encoding based on word frequency in the specific context of space travel. So that frequent words ('the', 'yes', 'shields') have less bits than less frequent words.

                                      3. Use markov chains to take the context of the sentence into account as well.






                                      share|improve this answer









                                      $endgroup$
















                                        1












                                        1








                                        1





                                        $begingroup$


                                        1. Encode whole words instead of single letters.

                                        2. Use Huffmann encoding based on word frequency in the specific context of space travel. So that frequent words ('the', 'yes', 'shields') have less bits than less frequent words.

                                        3. Use markov chains to take the context of the sentence into account as well.






                                        share|improve this answer









                                        $endgroup$




                                        1. Encode whole words instead of single letters.

                                        2. Use Huffmann encoding based on word frequency in the specific context of space travel. So that frequent words ('the', 'yes', 'shields') have less bits than less frequent words.

                                        3. Use markov chains to take the context of the sentence into account as well.







                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered 7 hours ago









                                        HelenaHelena

                                        1512




                                        1512























                                            1












                                            $begingroup$

                                            Oops! When I was writing this, I forgot you said "5 bits per hour" and was thinking "5 bits per day"... read all instances of "day" below a "hour". I'm leaving this message temporarily in case I missed any instances.



                                            Here is the literal answer to your question:



                                            Use base64 character encoding. This allows you to represent the English characters, including numbers, which is precisely what you said you wanted, using 6 bits per character which is just 1 bit short of fitting into 1 hour's worth of transmission in your circumstance.



                                            And here are some enhancements to that by adding special "modes"...



                                            This includes both upper and lower case letters. If you are fine with restricting yourself to one case, which would still fulfill your requirements, then you would have room left to include more punctuation or other enhancements (up to 26 other enhanced transmission modes). I would recommend using some of this extra space to represent some extremely common words or short phrases that you would use very, very often. Then use a few of the character slots for other special meanings, such as "the next few bytes represent status codes" or "the following data is compressed".



                                            Mode 1: Table of most common words or phrases



                                            For the examples below, I'll assume that 2 of the characters represent different word/phrase lookup tables using 9 bits each since this allows the lookup to take exactly 3 hours to send, including the initial 6 bits (6 + 9 = 15 bits = 3 hours). This allows for 512 bits worth of lookup power times 2; that is, 1024 different shortened words or phrases.



                                            Using this format...



                                            "Hey Bob" as plain text requires 6*7=42 bits = 8 hours + 2 bits



                                            "communication array damaged by [reason]", assuming "communication array" and "damaged" are both in the lookup table, would take 9 hour + 3 bits + [however long it takes to send the reason]. "communication array damaged by Klingon torp" would take 24 hours - less if either "Klingon" or "torp" were send as lookup words instead of as plain text.



                                            Mode 2: Look-back



                                            This is a "repeat recent word" mode. In computer science, it has been shown that recently used data is among the most likely data to be used next, and that is what a PC memory cache is for. We can do something similar by making 1 of the character slots represent "The next 4 bits refer to a previous word; count back that many words in the most recent transmitted data."



                                            With this, "Klingon fleet approaches from 294 and Klingon admiral on comms saying Klingon destroyers equipped with new black hole tech" allows you to shorten 2 instances of "Klingon" to exactly 2 hours worth of data each; the first one providing "0110" (6) as the 4-bit-lookback value and the second instance being "0101" (5). In some communications this could save a lot of time if words are repeated often. Note that pronouns like "their" could have been used in some places, but that would take 7 plaintext characters (including surrounding spaces), which in this case would have taken 6 hours + 2 bits longer to send.



                                            Mode 3: Copy/paste, possibly with separate paste-buffers



                                            This would allow customized shortcuts that were not thought of before launch, a sort of copy/paste. "start copying here", then continue the message, then a "stop copying here"... then later you can send a "paste" character to repeat a long message. "comms good, thrusters good, life support good, magnetic-artificial-gravity [copy]working intermittently due to a swarm of flies in the grav-capacitor[end copy] from a meal someone left out for days", then you only have to send that long text 1 time, and each time after that you send "comms good, thrusters good, life support good, magnetic-artificial-gravity [paste]", and you do that until it changes back to "good". Also, "comms", "thrusters", "life support", "magnetic-artificial-gravity", and "good" might all be in the lookup table, meaning this entire message takes 22 hours + 1 bit to send after the first time you send it. Even better is if you make this "paste mode" be followed by a few bits for a "paste buffer number". Then you could "[copy1]comms good, [etc.], [copy2]working intermittently due to...[end-copy-2][end-copy-1] from a meal that..." Then every time you want to send an updated status, if it's the same as the one before it takes 1 hour plus a few bits.



                                            You can tweak the exact representation (different send modes, number of bits for each, etc.) to improve performance based on your expected communications to improve performance further.



                                            Mode 4: Compression



                                            Just what it sounds like. The following data uses a given compression algorithm.



                                            Multiple versions of each mode



                                            If you cut out lower case and still have more character slots left over after implementing all the punctuation and mode's you want, you can use left over character slots to expand modes that you already have. This is similar to what I did above with the lookup tables, I suggested using 2 of the base64 character slots that were freed up from tossing lower-case to give us 2 separate lookup tables. You could also do similar to double your look-back reach or to double your number of copy/paste buffers. You can also increase or decrease the number of bits following a mode-byte, such as having 4 bits after copy/paste to have 16 paste buffers, or only 2 extra bits to save on transmission time but allow only 4 paste buffers.



                                            So how efficient is this?



                                            Worst case scenario this requires 6 bits per character. Average case scenario you will use a few lookups, look-backs, or some compression to beat the worst case, so you require 3-5 bits per character. Best case is messages that can be relayed entirely, or nearly entirely, by lookups and look-backs, which will be often for normal day to day activities that go as expected - for such common communication, if you have a well tuned number of bits for each special mode, you should achieve better than 1 bit per character. Many times much, much better than 1 bit per character, such as with the status report example a couple paragraphs up.






                                            share|improve this answer











                                            $endgroup$













                                            • $begingroup$
                                              Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed by zlib by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
                                              $endgroup$
                                              – Henning Makholm
                                              5 hours ago
















                                            1












                                            $begingroup$

                                            Oops! When I was writing this, I forgot you said "5 bits per hour" and was thinking "5 bits per day"... read all instances of "day" below a "hour". I'm leaving this message temporarily in case I missed any instances.



                                            Here is the literal answer to your question:



                                            Use base64 character encoding. This allows you to represent the English characters, including numbers, which is precisely what you said you wanted, using 6 bits per character which is just 1 bit short of fitting into 1 hour's worth of transmission in your circumstance.



                                            And here are some enhancements to that by adding special "modes"...



                                            This includes both upper and lower case letters. If you are fine with restricting yourself to one case, which would still fulfill your requirements, then you would have room left to include more punctuation or other enhancements (up to 26 other enhanced transmission modes). I would recommend using some of this extra space to represent some extremely common words or short phrases that you would use very, very often. Then use a few of the character slots for other special meanings, such as "the next few bytes represent status codes" or "the following data is compressed".



                                            Mode 1: Table of most common words or phrases



                                            For the examples below, I'll assume that 2 of the characters represent different word/phrase lookup tables using 9 bits each since this allows the lookup to take exactly 3 hours to send, including the initial 6 bits (6 + 9 = 15 bits = 3 hours). This allows for 512 bits worth of lookup power times 2; that is, 1024 different shortened words or phrases.



                                            Using this format...



                                            "Hey Bob" as plain text requires 6*7=42 bits = 8 hours + 2 bits



                                            "communication array damaged by [reason]", assuming "communication array" and "damaged" are both in the lookup table, would take 9 hour + 3 bits + [however long it takes to send the reason]. "communication array damaged by Klingon torp" would take 24 hours - less if either "Klingon" or "torp" were send as lookup words instead of as plain text.



                                            Mode 2: Look-back



                                            This is a "repeat recent word" mode. In computer science, it has been shown that recently used data is among the most likely data to be used next, and that is what a PC memory cache is for. We can do something similar by making 1 of the character slots represent "The next 4 bits refer to a previous word; count back that many words in the most recent transmitted data."



                                            With this, "Klingon fleet approaches from 294 and Klingon admiral on comms saying Klingon destroyers equipped with new black hole tech" allows you to shorten 2 instances of "Klingon" to exactly 2 hours worth of data each; the first one providing "0110" (6) as the 4-bit-lookback value and the second instance being "0101" (5). In some communications this could save a lot of time if words are repeated often. Note that pronouns like "their" could have been used in some places, but that would take 7 plaintext characters (including surrounding spaces), which in this case would have taken 6 hours + 2 bits longer to send.



                                            Mode 3: Copy/paste, possibly with separate paste-buffers



                                            This would allow customized shortcuts that were not thought of before launch, a sort of copy/paste. "start copying here", then continue the message, then a "stop copying here"... then later you can send a "paste" character to repeat a long message. "comms good, thrusters good, life support good, magnetic-artificial-gravity [copy]working intermittently due to a swarm of flies in the grav-capacitor[end copy] from a meal someone left out for days", then you only have to send that long text 1 time, and each time after that you send "comms good, thrusters good, life support good, magnetic-artificial-gravity [paste]", and you do that until it changes back to "good". Also, "comms", "thrusters", "life support", "magnetic-artificial-gravity", and "good" might all be in the lookup table, meaning this entire message takes 22 hours + 1 bit to send after the first time you send it. Even better is if you make this "paste mode" be followed by a few bits for a "paste buffer number". Then you could "[copy1]comms good, [etc.], [copy2]working intermittently due to...[end-copy-2][end-copy-1] from a meal that..." Then every time you want to send an updated status, if it's the same as the one before it takes 1 hour plus a few bits.



                                            You can tweak the exact representation (different send modes, number of bits for each, etc.) to improve performance based on your expected communications to improve performance further.



                                            Mode 4: Compression



                                            Just what it sounds like. The following data uses a given compression algorithm.



                                            Multiple versions of each mode



                                            If you cut out lower case and still have more character slots left over after implementing all the punctuation and mode's you want, you can use left over character slots to expand modes that you already have. This is similar to what I did above with the lookup tables, I suggested using 2 of the base64 character slots that were freed up from tossing lower-case to give us 2 separate lookup tables. You could also do similar to double your look-back reach or to double your number of copy/paste buffers. You can also increase or decrease the number of bits following a mode-byte, such as having 4 bits after copy/paste to have 16 paste buffers, or only 2 extra bits to save on transmission time but allow only 4 paste buffers.



                                            So how efficient is this?



                                            Worst case scenario this requires 6 bits per character. Average case scenario you will use a few lookups, look-backs, or some compression to beat the worst case, so you require 3-5 bits per character. Best case is messages that can be relayed entirely, or nearly entirely, by lookups and look-backs, which will be often for normal day to day activities that go as expected - for such common communication, if you have a well tuned number of bits for each special mode, you should achieve better than 1 bit per character. Many times much, much better than 1 bit per character, such as with the status report example a couple paragraphs up.






                                            share|improve this answer











                                            $endgroup$













                                            • $begingroup$
                                              Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed by zlib by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
                                              $endgroup$
                                              – Henning Makholm
                                              5 hours ago














                                            1












                                            1








                                            1





                                            $begingroup$

                                            Oops! When I was writing this, I forgot you said "5 bits per hour" and was thinking "5 bits per day"... read all instances of "day" below a "hour". I'm leaving this message temporarily in case I missed any instances.



                                            Here is the literal answer to your question:



                                            Use base64 character encoding. This allows you to represent the English characters, including numbers, which is precisely what you said you wanted, using 6 bits per character which is just 1 bit short of fitting into 1 hour's worth of transmission in your circumstance.



                                            And here are some enhancements to that by adding special "modes"...



                                            This includes both upper and lower case letters. If you are fine with restricting yourself to one case, which would still fulfill your requirements, then you would have room left to include more punctuation or other enhancements (up to 26 other enhanced transmission modes). I would recommend using some of this extra space to represent some extremely common words or short phrases that you would use very, very often. Then use a few of the character slots for other special meanings, such as "the next few bytes represent status codes" or "the following data is compressed".



                                            Mode 1: Table of most common words or phrases



                                            For the examples below, I'll assume that 2 of the characters represent different word/phrase lookup tables using 9 bits each since this allows the lookup to take exactly 3 hours to send, including the initial 6 bits (6 + 9 = 15 bits = 3 hours). This allows for 512 bits worth of lookup power times 2; that is, 1024 different shortened words or phrases.



                                            Using this format...



                                            "Hey Bob" as plain text requires 6*7=42 bits = 8 hours + 2 bits



                                            "communication array damaged by [reason]", assuming "communication array" and "damaged" are both in the lookup table, would take 9 hour + 3 bits + [however long it takes to send the reason]. "communication array damaged by Klingon torp" would take 24 hours - less if either "Klingon" or "torp" were send as lookup words instead of as plain text.



                                            Mode 2: Look-back



                                            This is a "repeat recent word" mode. In computer science, it has been shown that recently used data is among the most likely data to be used next, and that is what a PC memory cache is for. We can do something similar by making 1 of the character slots represent "The next 4 bits refer to a previous word; count back that many words in the most recent transmitted data."



                                            With this, "Klingon fleet approaches from 294 and Klingon admiral on comms saying Klingon destroyers equipped with new black hole tech" allows you to shorten 2 instances of "Klingon" to exactly 2 hours worth of data each; the first one providing "0110" (6) as the 4-bit-lookback value and the second instance being "0101" (5). In some communications this could save a lot of time if words are repeated often. Note that pronouns like "their" could have been used in some places, but that would take 7 plaintext characters (including surrounding spaces), which in this case would have taken 6 hours + 2 bits longer to send.



                                            Mode 3: Copy/paste, possibly with separate paste-buffers



                                            This would allow customized shortcuts that were not thought of before launch, a sort of copy/paste. "start copying here", then continue the message, then a "stop copying here"... then later you can send a "paste" character to repeat a long message. "comms good, thrusters good, life support good, magnetic-artificial-gravity [copy]working intermittently due to a swarm of flies in the grav-capacitor[end copy] from a meal someone left out for days", then you only have to send that long text 1 time, and each time after that you send "comms good, thrusters good, life support good, magnetic-artificial-gravity [paste]", and you do that until it changes back to "good". Also, "comms", "thrusters", "life support", "magnetic-artificial-gravity", and "good" might all be in the lookup table, meaning this entire message takes 22 hours + 1 bit to send after the first time you send it. Even better is if you make this "paste mode" be followed by a few bits for a "paste buffer number". Then you could "[copy1]comms good, [etc.], [copy2]working intermittently due to...[end-copy-2][end-copy-1] from a meal that..." Then every time you want to send an updated status, if it's the same as the one before it takes 1 hour plus a few bits.



                                            You can tweak the exact representation (different send modes, number of bits for each, etc.) to improve performance based on your expected communications to improve performance further.



                                            Mode 4: Compression



                                            Just what it sounds like. The following data uses a given compression algorithm.



                                            Multiple versions of each mode



                                            If you cut out lower case and still have more character slots left over after implementing all the punctuation and mode's you want, you can use left over character slots to expand modes that you already have. This is similar to what I did above with the lookup tables, I suggested using 2 of the base64 character slots that were freed up from tossing lower-case to give us 2 separate lookup tables. You could also do similar to double your look-back reach or to double your number of copy/paste buffers. You can also increase or decrease the number of bits following a mode-byte, such as having 4 bits after copy/paste to have 16 paste buffers, or only 2 extra bits to save on transmission time but allow only 4 paste buffers.



                                            So how efficient is this?



                                            Worst case scenario this requires 6 bits per character. Average case scenario you will use a few lookups, look-backs, or some compression to beat the worst case, so you require 3-5 bits per character. Best case is messages that can be relayed entirely, or nearly entirely, by lookups and look-backs, which will be often for normal day to day activities that go as expected - for such common communication, if you have a well tuned number of bits for each special mode, you should achieve better than 1 bit per character. Many times much, much better than 1 bit per character, such as with the status report example a couple paragraphs up.






                                            share|improve this answer











                                            $endgroup$



                                            Oops! When I was writing this, I forgot you said "5 bits per hour" and was thinking "5 bits per day"... read all instances of "day" below a "hour". I'm leaving this message temporarily in case I missed any instances.



                                            Here is the literal answer to your question:



                                            Use base64 character encoding. This allows you to represent the English characters, including numbers, which is precisely what you said you wanted, using 6 bits per character which is just 1 bit short of fitting into 1 hour's worth of transmission in your circumstance.



                                            And here are some enhancements to that by adding special "modes"...



                                            This includes both upper and lower case letters. If you are fine with restricting yourself to one case, which would still fulfill your requirements, then you would have room left to include more punctuation or other enhancements (up to 26 other enhanced transmission modes). I would recommend using some of this extra space to represent some extremely common words or short phrases that you would use very, very often. Then use a few of the character slots for other special meanings, such as "the next few bytes represent status codes" or "the following data is compressed".



                                            Mode 1: Table of most common words or phrases



                                            For the examples below, I'll assume that 2 of the characters represent different word/phrase lookup tables using 9 bits each since this allows the lookup to take exactly 3 hours to send, including the initial 6 bits (6 + 9 = 15 bits = 3 hours). This allows for 512 bits worth of lookup power times 2; that is, 1024 different shortened words or phrases.



                                            Using this format...



                                            "Hey Bob" as plain text requires 6*7=42 bits = 8 hours + 2 bits



                                            "communication array damaged by [reason]", assuming "communication array" and "damaged" are both in the lookup table, would take 9 hour + 3 bits + [however long it takes to send the reason]. "communication array damaged by Klingon torp" would take 24 hours - less if either "Klingon" or "torp" were send as lookup words instead of as plain text.



                                            Mode 2: Look-back



                                            This is a "repeat recent word" mode. In computer science, it has been shown that recently used data is among the most likely data to be used next, and that is what a PC memory cache is for. We can do something similar by making 1 of the character slots represent "The next 4 bits refer to a previous word; count back that many words in the most recent transmitted data."



                                            With this, "Klingon fleet approaches from 294 and Klingon admiral on comms saying Klingon destroyers equipped with new black hole tech" allows you to shorten 2 instances of "Klingon" to exactly 2 hours worth of data each; the first one providing "0110" (6) as the 4-bit-lookback value and the second instance being "0101" (5). In some communications this could save a lot of time if words are repeated often. Note that pronouns like "their" could have been used in some places, but that would take 7 plaintext characters (including surrounding spaces), which in this case would have taken 6 hours + 2 bits longer to send.



                                            Mode 3: Copy/paste, possibly with separate paste-buffers



                                            This would allow customized shortcuts that were not thought of before launch, a sort of copy/paste. "start copying here", then continue the message, then a "stop copying here"... then later you can send a "paste" character to repeat a long message. "comms good, thrusters good, life support good, magnetic-artificial-gravity [copy]working intermittently due to a swarm of flies in the grav-capacitor[end copy] from a meal someone left out for days", then you only have to send that long text 1 time, and each time after that you send "comms good, thrusters good, life support good, magnetic-artificial-gravity [paste]", and you do that until it changes back to "good". Also, "comms", "thrusters", "life support", "magnetic-artificial-gravity", and "good" might all be in the lookup table, meaning this entire message takes 22 hours + 1 bit to send after the first time you send it. Even better is if you make this "paste mode" be followed by a few bits for a "paste buffer number". Then you could "[copy1]comms good, [etc.], [copy2]working intermittently due to...[end-copy-2][end-copy-1] from a meal that..." Then every time you want to send an updated status, if it's the same as the one before it takes 1 hour plus a few bits.



                                            You can tweak the exact representation (different send modes, number of bits for each, etc.) to improve performance based on your expected communications to improve performance further.



                                            Mode 4: Compression



                                            Just what it sounds like. The following data uses a given compression algorithm.



                                            Multiple versions of each mode



                                            If you cut out lower case and still have more character slots left over after implementing all the punctuation and mode's you want, you can use left over character slots to expand modes that you already have. This is similar to what I did above with the lookup tables, I suggested using 2 of the base64 character slots that were freed up from tossing lower-case to give us 2 separate lookup tables. You could also do similar to double your look-back reach or to double your number of copy/paste buffers. You can also increase or decrease the number of bits following a mode-byte, such as having 4 bits after copy/paste to have 16 paste buffers, or only 2 extra bits to save on transmission time but allow only 4 paste buffers.



                                            So how efficient is this?



                                            Worst case scenario this requires 6 bits per character. Average case scenario you will use a few lookups, look-backs, or some compression to beat the worst case, so you require 3-5 bits per character. Best case is messages that can be relayed entirely, or nearly entirely, by lookups and look-backs, which will be often for normal day to day activities that go as expected - for such common communication, if you have a well tuned number of bits for each special mode, you should achieve better than 1 bit per character. Many times much, much better than 1 bit per character, such as with the status report example a couple paragraphs up.







                                            share|improve this answer














                                            share|improve this answer



                                            share|improve this answer








                                            edited 6 hours ago

























                                            answered 7 hours ago









                                            AaronAaron

                                            2,554620




                                            2,554620












                                            • $begingroup$
                                              Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed by zlib by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
                                              $endgroup$
                                              – Henning Makholm
                                              5 hours ago


















                                            • $begingroup$
                                              Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed by zlib by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
                                              $endgroup$
                                              – Henning Makholm
                                              5 hours ago
















                                            $begingroup$
                                            Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed by zlib by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
                                            $endgroup$
                                            – Henning Makholm
                                            5 hours ago




                                            $begingroup$
                                            Your modes 1-3 are already part of common LZ-derived compression algorithms. The "table of common phrases" strategy, for example, is exposed by zlib by methods to set a pre-agreed "dictionary". (Some additional tweaking to the algorithm could probably be applied to make it squeeze the last few bits out of short messages, though).
                                            $endgroup$
                                            – Henning Makholm
                                            5 hours ago











                                            0












                                            $begingroup$

                                            Huffman Encoding



                                            Basically, you want the same methodology we use today for writing to a .zip file. Basically what happens is we take the most common character in the file (probably 'e'), and say that it simply corresponds to the bit '1'. Then the next most common one ('a' maybe?) will be '01', and the next most common (let's say 't') will be '001'.



                                            So, given this system, "eat" = "101001", while "tea" = "001101".



                                            This is the most efficient form of encoding there is, as it gives you access to any number of characters while still using very few bits for the vast majority of the ones you're using.



                                            Note though: this is most effective when some letters/characters are used far more than other ones (as it is in modern English).



                                            Also, most .zip files will send along a "dictionary" of bit combinations and characters, so the other person can translate out of it. This can be wasteful to send every time, especially for short messages. However, if every user has a well-known dictionary that is encoded to best represent common English usages it can work.






                                            share|improve this answer









                                            $endgroup$


















                                              0












                                              $begingroup$

                                              Huffman Encoding



                                              Basically, you want the same methodology we use today for writing to a .zip file. Basically what happens is we take the most common character in the file (probably 'e'), and say that it simply corresponds to the bit '1'. Then the next most common one ('a' maybe?) will be '01', and the next most common (let's say 't') will be '001'.



                                              So, given this system, "eat" = "101001", while "tea" = "001101".



                                              This is the most efficient form of encoding there is, as it gives you access to any number of characters while still using very few bits for the vast majority of the ones you're using.



                                              Note though: this is most effective when some letters/characters are used far more than other ones (as it is in modern English).



                                              Also, most .zip files will send along a "dictionary" of bit combinations and characters, so the other person can translate out of it. This can be wasteful to send every time, especially for short messages. However, if every user has a well-known dictionary that is encoded to best represent common English usages it can work.






                                              share|improve this answer









                                              $endgroup$
















                                                0












                                                0








                                                0





                                                $begingroup$

                                                Huffman Encoding



                                                Basically, you want the same methodology we use today for writing to a .zip file. Basically what happens is we take the most common character in the file (probably 'e'), and say that it simply corresponds to the bit '1'. Then the next most common one ('a' maybe?) will be '01', and the next most common (let's say 't') will be '001'.



                                                So, given this system, "eat" = "101001", while "tea" = "001101".



                                                This is the most efficient form of encoding there is, as it gives you access to any number of characters while still using very few bits for the vast majority of the ones you're using.



                                                Note though: this is most effective when some letters/characters are used far more than other ones (as it is in modern English).



                                                Also, most .zip files will send along a "dictionary" of bit combinations and characters, so the other person can translate out of it. This can be wasteful to send every time, especially for short messages. However, if every user has a well-known dictionary that is encoded to best represent common English usages it can work.






                                                share|improve this answer









                                                $endgroup$



                                                Huffman Encoding



                                                Basically, you want the same methodology we use today for writing to a .zip file. Basically what happens is we take the most common character in the file (probably 'e'), and say that it simply corresponds to the bit '1'. Then the next most common one ('a' maybe?) will be '01', and the next most common (let's say 't') will be '001'.



                                                So, given this system, "eat" = "101001", while "tea" = "001101".



                                                This is the most efficient form of encoding there is, as it gives you access to any number of characters while still using very few bits for the vast majority of the ones you're using.



                                                Note though: this is most effective when some letters/characters are used far more than other ones (as it is in modern English).



                                                Also, most .zip files will send along a "dictionary" of bit combinations and characters, so the other person can translate out of it. This can be wasteful to send every time, especially for short messages. However, if every user has a well-known dictionary that is encoded to best represent common English usages it can work.







                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered 1 hour ago









                                                Bert HaddadBert Haddad

                                                3,050817




                                                3,050817






























                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Worldbuilding Stack Exchange!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    Use MathJax to format equations. MathJax reference.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworldbuilding.stackexchange.com%2fquestions%2f144224%2fmost-bit-efficient-text-communication-method%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Why not use the yoke to control yaw, as well as pitch and roll? Announcing the arrival of...

                                                    Couldn't open a raw socket. Error: Permission denied (13) (nmap)Is it possible to run networking commands...

                                                    VNC viewer RFB protocol error: bad desktop size 0x0I Cannot Type the Key 'd' (lowercase) in VNC Viewer...