How to count in linear time worst-case? Announcing the arrival of Valued Associate #679: Cesar...

All ASCII characters with a given bit count

Why didn't the Space Shuttle bounce back into space as many times as possible so as to lose a lot of kinetic energy up there?

How to have a sharp product image?

What is this word supposed to be?

Unknown code in script

The weakest link

Why do distances seem to matter in the Foundation world?

What *exactly* is electrical current, voltage, and resistance?

How can I wire a 9-position switch so that each position turns on one more LED than the one before?

Is there metaphorical meaning of "aus der Haft entlassen"?

Will I lose my paid in full property

What is purpose of DB Browser(dbbrowser.aspx) under admin tool?

Should the Product Owner dictate what info the UI needs to display?

Why does Arg'[1. + I] return -0.5?

As an international instructor, should I openly talk about my accent?

How to keep bees out of canned beverages?

I preordered a game on my Xbox while on the home screen of my friend's account. Which of us owns the game?

Can you stand up from being prone using Skirmisher outside of your turn?

Jaya, Venerated Firemage + Chandra's Pyrohelix = 4 damage among two targets?

Combinatorics problem, right solution?

How to find if a column is referenced in a computed column?

Bayes factor vs P value

A strange hotel

What makes accurate emulation of old systems a difficult task?



How to count in linear time worst-case?



Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?Sorting a set of $n$ elements containing only $log n$ unique elementsIs there a sorting algorithm of order $n + k log{k}$?Priority queue with unique elements and sublinear time merge?Average Case runtime for random choice searchCan element uniqueness be solved in deterministic linear time?Building static hash table with particular collisionsDoes my simple, static hash table have O(1) worst case lookup?Sorting a set of $n$ elements containing only $log n$ unique elementsCan the HTML5 parsing algorithm be implemented in linear time?Optimal data structure for sorted listDetermine the worst-case complexity that allow you to conclude that a given array with n elements is not sortedFind four sets where each element from those four appears in at least two of those four sets












8












$begingroup$


This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:




Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = {(x_i, c_i)}^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.




Here are some (failed) ideas I've had and have been suggested:





  1. Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.


  2. Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.


  3. Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.


  4. Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.


Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?










share|cite|improve this question









$endgroup$








  • 2




    $begingroup$
    It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    @Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
    $endgroup$
    – ryan
    yesterday










  • $begingroup$
    Why is the HashMap not assured amortized O(n) ?
    $endgroup$
    – javadba
    21 hours ago






  • 1




    $begingroup$
    @javadba For example, suppose all elements are hashed to the same value.
    $endgroup$
    – Apass.Jack
    20 hours ago










  • $begingroup$
    Ah ok so if it's an imperfect hashing.
    $endgroup$
    – javadba
    20 hours ago


















8












$begingroup$


This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:




Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = {(x_i, c_i)}^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.




Here are some (failed) ideas I've had and have been suggested:





  1. Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.


  2. Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.


  3. Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.


  4. Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.


Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?










share|cite|improve this question









$endgroup$








  • 2




    $begingroup$
    It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    @Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
    $endgroup$
    – ryan
    yesterday










  • $begingroup$
    Why is the HashMap not assured amortized O(n) ?
    $endgroup$
    – javadba
    21 hours ago






  • 1




    $begingroup$
    @javadba For example, suppose all elements are hashed to the same value.
    $endgroup$
    – Apass.Jack
    20 hours ago










  • $begingroup$
    Ah ok so if it's an imperfect hashing.
    $endgroup$
    – javadba
    20 hours ago
















8












8








8





$begingroup$


This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:




Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = {(x_i, c_i)}^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.




Here are some (failed) ideas I've had and have been suggested:





  1. Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.


  2. Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.


  3. Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.


  4. Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.


Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?










share|cite|improve this question









$endgroup$




This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:




Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = {(x_i, c_i)}^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.




Here are some (failed) ideas I've had and have been suggested:





  1. Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.


  2. Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.


  3. Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.


  4. Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.


Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?







algorithms search-trees hash-tables






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked yesterday









ryanryan

3,2541927




3,2541927








  • 2




    $begingroup$
    It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    @Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
    $endgroup$
    – ryan
    yesterday










  • $begingroup$
    Why is the HashMap not assured amortized O(n) ?
    $endgroup$
    – javadba
    21 hours ago






  • 1




    $begingroup$
    @javadba For example, suppose all elements are hashed to the same value.
    $endgroup$
    – Apass.Jack
    20 hours ago










  • $begingroup$
    Ah ok so if it's an imperfect hashing.
    $endgroup$
    – javadba
    20 hours ago
















  • 2




    $begingroup$
    It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    @Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
    $endgroup$
    – ryan
    yesterday










  • $begingroup$
    Why is the HashMap not assured amortized O(n) ?
    $endgroup$
    – javadba
    21 hours ago






  • 1




    $begingroup$
    @javadba For example, suppose all elements are hashed to the same value.
    $endgroup$
    – Apass.Jack
    20 hours ago










  • $begingroup$
    Ah ok so if it's an imperfect hashing.
    $endgroup$
    – javadba
    20 hours ago










2




2




$begingroup$
It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
$endgroup$
– Apass.Jack
yesterday






$begingroup$
It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
$endgroup$
– Apass.Jack
yesterday














$begingroup$
@Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
$endgroup$
– ryan
yesterday




$begingroup$
@Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
$endgroup$
– ryan
yesterday












$begingroup$
Why is the HashMap not assured amortized O(n) ?
$endgroup$
– javadba
21 hours ago




$begingroup$
Why is the HashMap not assured amortized O(n) ?
$endgroup$
– javadba
21 hours ago




1




1




$begingroup$
@javadba For example, suppose all elements are hashed to the same value.
$endgroup$
– Apass.Jack
20 hours ago




$begingroup$
@javadba For example, suppose all elements are hashed to the same value.
$endgroup$
– Apass.Jack
20 hours ago












$begingroup$
Ah ok so if it's an imperfect hashing.
$endgroup$
– javadba
20 hours ago






$begingroup$
Ah ok so if it's an imperfect hashing.
$endgroup$
– javadba
20 hours ago












3 Answers
3






active

oldest

votes


















5












$begingroup$

This is a nice question.



In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.



However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
    $endgroup$
    – mascoj
    21 hours ago








  • 2




    $begingroup$
    What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
    $endgroup$
    – Apass.Jack
    21 hours ago










  • $begingroup$
    Good call. Thanks
    $endgroup$
    – mascoj
    20 hours ago



















1












$begingroup$

There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.



In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.



As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^{100}$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^{100}$.






share|cite|improve this answer









$endgroup$





















    1












    $begingroup$

    Your approach 3 can be made safe using a solution to exercise 2.12 of Aho, Hopcroft, and Ullman (1974) The Design and Analysis of Computer Algorithms as described, for example, in Using uninitialized memory for fun and profit.



    Basically, in addition to your array of N elements with the counts you have two arrays of N elements and one auxiliary count to create a sparse set indicating which of the counts are valid.



    In C-like pseudocode:



    uint* a = malloc(n);
    uint* b = malloc(n);
    uint* c = malloc(n);
    uint len = 0;

    get_count(uint x) {
    uint idx = a[x];
    return idx >= 0 && idx < len && b[idx] == x ? c[idx] : 0;
    }

    increment_count(uint x) {
    uint idx = a[x];
    if (idx < 0 || idx >= len || b[idx] != x) {
    idx = len;
    len++;
    a[x] = idx;
    b[idx] = x;
    c[idx] = 0;
    }
    c[idx]++;
    }


    Practical implementation of the sparse set is discussed in this StackOverflow answer.






    share|cite|improve this answer











    $endgroup$













    • $begingroup$
      PS c could be indexed on x or idx, but I used idx for better cache locality.
      $endgroup$
      – Peter Taylor
      9 hours ago










    • $begingroup$
      I like the answer, but I've confused about what makes this safe. While, entirely improbable, couldn't you access a memory cell, which by some miracle has a "valid" entry in it even though it was never put there. If you just got unlucky with malloc?
      $endgroup$
      – ryan
      1 hour ago






    • 1




      $begingroup$
      This solution only works if you have a large enough memory: if all array elements are in the range $1..u$, then you need memory of size at least $u$. In practice this is very limiting. The way we create a large virtual address space in practice is using page tables, which are a tree-based data structure; the hardware invisibly follows page tables for us. As a result, while we think of memory access as taking $O(1)$ time, if you're working in a large memory address space, each memory access actually takes logarithmic time (to traverse the page table tree structure).
      $endgroup$
      – D.W.
      1 hour ago










    • $begingroup$
      @ryan, see research.swtch.com/sparse for what makes it safe. It's definitely a very clever trick.
      $endgroup$
      – D.W.
      1 hour ago














    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "419"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f108465%2fhow-to-count-in-linear-time-worst-case%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    5












    $begingroup$

    This is a nice question.



    In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.



    However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.






    share|cite|improve this answer









    $endgroup$













    • $begingroup$
      Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
      $endgroup$
      – mascoj
      21 hours ago








    • 2




      $begingroup$
      What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
      $endgroup$
      – Apass.Jack
      21 hours ago










    • $begingroup$
      Good call. Thanks
      $endgroup$
      – mascoj
      20 hours ago
















    5












    $begingroup$

    This is a nice question.



    In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.



    However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.






    share|cite|improve this answer









    $endgroup$













    • $begingroup$
      Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
      $endgroup$
      – mascoj
      21 hours ago








    • 2




      $begingroup$
      What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
      $endgroup$
      – Apass.Jack
      21 hours ago










    • $begingroup$
      Good call. Thanks
      $endgroup$
      – mascoj
      20 hours ago














    5












    5








    5





    $begingroup$

    This is a nice question.



    In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.



    However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.






    share|cite|improve this answer









    $endgroup$



    This is a nice question.



    In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.



    However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered yesterday









    Apass.JackApass.Jack

    14.6k1940




    14.6k1940












    • $begingroup$
      Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
      $endgroup$
      – mascoj
      21 hours ago








    • 2




      $begingroup$
      What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
      $endgroup$
      – Apass.Jack
      21 hours ago










    • $begingroup$
      Good call. Thanks
      $endgroup$
      – mascoj
      20 hours ago


















    • $begingroup$
      Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
      $endgroup$
      – mascoj
      21 hours ago








    • 2




      $begingroup$
      What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
      $endgroup$
      – Apass.Jack
      21 hours ago










    • $begingroup$
      Good call. Thanks
      $endgroup$
      – mascoj
      20 hours ago
















    $begingroup$
    Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
    $endgroup$
    – mascoj
    21 hours ago






    $begingroup$
    Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
    $endgroup$
    – mascoj
    21 hours ago






    2




    2




    $begingroup$
    What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
    $endgroup$
    – Apass.Jack
    21 hours ago




    $begingroup$
    What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
    $endgroup$
    – Apass.Jack
    21 hours ago












    $begingroup$
    Good call. Thanks
    $endgroup$
    – mascoj
    20 hours ago




    $begingroup$
    Good call. Thanks
    $endgroup$
    – mascoj
    20 hours ago











    1












    $begingroup$

    There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.



    In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.



    As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^{100}$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^{100}$.






    share|cite|improve this answer









    $endgroup$


















      1












      $begingroup$

      There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.



      In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.



      As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^{100}$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^{100}$.






      share|cite|improve this answer









      $endgroup$
















        1












        1








        1





        $begingroup$

        There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.



        In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.



        As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^{100}$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^{100}$.






        share|cite|improve this answer









        $endgroup$



        There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.



        In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.



        As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^{100}$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^{100}$.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered yesterday









        D.W.D.W.

        104k14131298




        104k14131298























            1












            $begingroup$

            Your approach 3 can be made safe using a solution to exercise 2.12 of Aho, Hopcroft, and Ullman (1974) The Design and Analysis of Computer Algorithms as described, for example, in Using uninitialized memory for fun and profit.



            Basically, in addition to your array of N elements with the counts you have two arrays of N elements and one auxiliary count to create a sparse set indicating which of the counts are valid.



            In C-like pseudocode:



            uint* a = malloc(n);
            uint* b = malloc(n);
            uint* c = malloc(n);
            uint len = 0;

            get_count(uint x) {
            uint idx = a[x];
            return idx >= 0 && idx < len && b[idx] == x ? c[idx] : 0;
            }

            increment_count(uint x) {
            uint idx = a[x];
            if (idx < 0 || idx >= len || b[idx] != x) {
            idx = len;
            len++;
            a[x] = idx;
            b[idx] = x;
            c[idx] = 0;
            }
            c[idx]++;
            }


            Practical implementation of the sparse set is discussed in this StackOverflow answer.






            share|cite|improve this answer











            $endgroup$













            • $begingroup$
              PS c could be indexed on x or idx, but I used idx for better cache locality.
              $endgroup$
              – Peter Taylor
              9 hours ago










            • $begingroup$
              I like the answer, but I've confused about what makes this safe. While, entirely improbable, couldn't you access a memory cell, which by some miracle has a "valid" entry in it even though it was never put there. If you just got unlucky with malloc?
              $endgroup$
              – ryan
              1 hour ago






            • 1




              $begingroup$
              This solution only works if you have a large enough memory: if all array elements are in the range $1..u$, then you need memory of size at least $u$. In practice this is very limiting. The way we create a large virtual address space in practice is using page tables, which are a tree-based data structure; the hardware invisibly follows page tables for us. As a result, while we think of memory access as taking $O(1)$ time, if you're working in a large memory address space, each memory access actually takes logarithmic time (to traverse the page table tree structure).
              $endgroup$
              – D.W.
              1 hour ago










            • $begingroup$
              @ryan, see research.swtch.com/sparse for what makes it safe. It's definitely a very clever trick.
              $endgroup$
              – D.W.
              1 hour ago


















            1












            $begingroup$

            Your approach 3 can be made safe using a solution to exercise 2.12 of Aho, Hopcroft, and Ullman (1974) The Design and Analysis of Computer Algorithms as described, for example, in Using uninitialized memory for fun and profit.



            Basically, in addition to your array of N elements with the counts you have two arrays of N elements and one auxiliary count to create a sparse set indicating which of the counts are valid.



            In C-like pseudocode:



            uint* a = malloc(n);
            uint* b = malloc(n);
            uint* c = malloc(n);
            uint len = 0;

            get_count(uint x) {
            uint idx = a[x];
            return idx >= 0 && idx < len && b[idx] == x ? c[idx] : 0;
            }

            increment_count(uint x) {
            uint idx = a[x];
            if (idx < 0 || idx >= len || b[idx] != x) {
            idx = len;
            len++;
            a[x] = idx;
            b[idx] = x;
            c[idx] = 0;
            }
            c[idx]++;
            }


            Practical implementation of the sparse set is discussed in this StackOverflow answer.






            share|cite|improve this answer











            $endgroup$













            • $begingroup$
              PS c could be indexed on x or idx, but I used idx for better cache locality.
              $endgroup$
              – Peter Taylor
              9 hours ago










            • $begingroup$
              I like the answer, but I've confused about what makes this safe. While, entirely improbable, couldn't you access a memory cell, which by some miracle has a "valid" entry in it even though it was never put there. If you just got unlucky with malloc?
              $endgroup$
              – ryan
              1 hour ago






            • 1




              $begingroup$
              This solution only works if you have a large enough memory: if all array elements are in the range $1..u$, then you need memory of size at least $u$. In practice this is very limiting. The way we create a large virtual address space in practice is using page tables, which are a tree-based data structure; the hardware invisibly follows page tables for us. As a result, while we think of memory access as taking $O(1)$ time, if you're working in a large memory address space, each memory access actually takes logarithmic time (to traverse the page table tree structure).
              $endgroup$
              – D.W.
              1 hour ago










            • $begingroup$
              @ryan, see research.swtch.com/sparse for what makes it safe. It's definitely a very clever trick.
              $endgroup$
              – D.W.
              1 hour ago
















            1












            1








            1





            $begingroup$

            Your approach 3 can be made safe using a solution to exercise 2.12 of Aho, Hopcroft, and Ullman (1974) The Design and Analysis of Computer Algorithms as described, for example, in Using uninitialized memory for fun and profit.



            Basically, in addition to your array of N elements with the counts you have two arrays of N elements and one auxiliary count to create a sparse set indicating which of the counts are valid.



            In C-like pseudocode:



            uint* a = malloc(n);
            uint* b = malloc(n);
            uint* c = malloc(n);
            uint len = 0;

            get_count(uint x) {
            uint idx = a[x];
            return idx >= 0 && idx < len && b[idx] == x ? c[idx] : 0;
            }

            increment_count(uint x) {
            uint idx = a[x];
            if (idx < 0 || idx >= len || b[idx] != x) {
            idx = len;
            len++;
            a[x] = idx;
            b[idx] = x;
            c[idx] = 0;
            }
            c[idx]++;
            }


            Practical implementation of the sparse set is discussed in this StackOverflow answer.






            share|cite|improve this answer











            $endgroup$



            Your approach 3 can be made safe using a solution to exercise 2.12 of Aho, Hopcroft, and Ullman (1974) The Design and Analysis of Computer Algorithms as described, for example, in Using uninitialized memory for fun and profit.



            Basically, in addition to your array of N elements with the counts you have two arrays of N elements and one auxiliary count to create a sparse set indicating which of the counts are valid.



            In C-like pseudocode:



            uint* a = malloc(n);
            uint* b = malloc(n);
            uint* c = malloc(n);
            uint len = 0;

            get_count(uint x) {
            uint idx = a[x];
            return idx >= 0 && idx < len && b[idx] == x ? c[idx] : 0;
            }

            increment_count(uint x) {
            uint idx = a[x];
            if (idx < 0 || idx >= len || b[idx] != x) {
            idx = len;
            len++;
            a[x] = idx;
            b[idx] = x;
            c[idx] = 0;
            }
            c[idx]++;
            }


            Practical implementation of the sparse set is discussed in this StackOverflow answer.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited 10 hours ago

























            answered 10 hours ago









            Peter TaylorPeter Taylor

            1,356611




            1,356611












            • $begingroup$
              PS c could be indexed on x or idx, but I used idx for better cache locality.
              $endgroup$
              – Peter Taylor
              9 hours ago










            • $begingroup$
              I like the answer, but I've confused about what makes this safe. While, entirely improbable, couldn't you access a memory cell, which by some miracle has a "valid" entry in it even though it was never put there. If you just got unlucky with malloc?
              $endgroup$
              – ryan
              1 hour ago






            • 1




              $begingroup$
              This solution only works if you have a large enough memory: if all array elements are in the range $1..u$, then you need memory of size at least $u$. In practice this is very limiting. The way we create a large virtual address space in practice is using page tables, which are a tree-based data structure; the hardware invisibly follows page tables for us. As a result, while we think of memory access as taking $O(1)$ time, if you're working in a large memory address space, each memory access actually takes logarithmic time (to traverse the page table tree structure).
              $endgroup$
              – D.W.
              1 hour ago










            • $begingroup$
              @ryan, see research.swtch.com/sparse for what makes it safe. It's definitely a very clever trick.
              $endgroup$
              – D.W.
              1 hour ago




















            • $begingroup$
              PS c could be indexed on x or idx, but I used idx for better cache locality.
              $endgroup$
              – Peter Taylor
              9 hours ago










            • $begingroup$
              I like the answer, but I've confused about what makes this safe. While, entirely improbable, couldn't you access a memory cell, which by some miracle has a "valid" entry in it even though it was never put there. If you just got unlucky with malloc?
              $endgroup$
              – ryan
              1 hour ago






            • 1




              $begingroup$
              This solution only works if you have a large enough memory: if all array elements are in the range $1..u$, then you need memory of size at least $u$. In practice this is very limiting. The way we create a large virtual address space in practice is using page tables, which are a tree-based data structure; the hardware invisibly follows page tables for us. As a result, while we think of memory access as taking $O(1)$ time, if you're working in a large memory address space, each memory access actually takes logarithmic time (to traverse the page table tree structure).
              $endgroup$
              – D.W.
              1 hour ago










            • $begingroup$
              @ryan, see research.swtch.com/sparse for what makes it safe. It's definitely a very clever trick.
              $endgroup$
              – D.W.
              1 hour ago


















            $begingroup$
            PS c could be indexed on x or idx, but I used idx for better cache locality.
            $endgroup$
            – Peter Taylor
            9 hours ago




            $begingroup$
            PS c could be indexed on x or idx, but I used idx for better cache locality.
            $endgroup$
            – Peter Taylor
            9 hours ago












            $begingroup$
            I like the answer, but I've confused about what makes this safe. While, entirely improbable, couldn't you access a memory cell, which by some miracle has a "valid" entry in it even though it was never put there. If you just got unlucky with malloc?
            $endgroup$
            – ryan
            1 hour ago




            $begingroup$
            I like the answer, but I've confused about what makes this safe. While, entirely improbable, couldn't you access a memory cell, which by some miracle has a "valid" entry in it even though it was never put there. If you just got unlucky with malloc?
            $endgroup$
            – ryan
            1 hour ago




            1




            1




            $begingroup$
            This solution only works if you have a large enough memory: if all array elements are in the range $1..u$, then you need memory of size at least $u$. In practice this is very limiting. The way we create a large virtual address space in practice is using page tables, which are a tree-based data structure; the hardware invisibly follows page tables for us. As a result, while we think of memory access as taking $O(1)$ time, if you're working in a large memory address space, each memory access actually takes logarithmic time (to traverse the page table tree structure).
            $endgroup$
            – D.W.
            1 hour ago




            $begingroup$
            This solution only works if you have a large enough memory: if all array elements are in the range $1..u$, then you need memory of size at least $u$. In practice this is very limiting. The way we create a large virtual address space in practice is using page tables, which are a tree-based data structure; the hardware invisibly follows page tables for us. As a result, while we think of memory access as taking $O(1)$ time, if you're working in a large memory address space, each memory access actually takes logarithmic time (to traverse the page table tree structure).
            $endgroup$
            – D.W.
            1 hour ago












            $begingroup$
            @ryan, see research.swtch.com/sparse for what makes it safe. It's definitely a very clever trick.
            $endgroup$
            – D.W.
            1 hour ago






            $begingroup$
            @ryan, see research.swtch.com/sparse for what makes it safe. It's definitely a very clever trick.
            $endgroup$
            – D.W.
            1 hour ago




















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Computer Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f108465%2fhow-to-count-in-linear-time-worst-case%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Why not use the yoke to control yaw, as well as pitch and roll? Announcing the arrival of...

            Couldn't open a raw socket. Error: Permission denied (13) (nmap)Is it possible to run networking commands...

            VNC viewer RFB protocol error: bad desktop size 0x0I Cannot Type the Key 'd' (lowercase) in VNC Viewer...