Find the identical rows in a matrixHow to efficiently find positions of duplicates?How to find rows that have...

How can I place the product on a social media post better?

What happened to Captain America in Endgame?

How to pronounce 'C++' in Spanish

Symbolic Multivariate Distribution

Was there a shared-world project before "Thieves World"?

Normal Map bad shading in Rendered display

Pulling the rope with one hand is as heavy as with two hands?

Does a strong solution to a SDE imply lipschitz condition?

Can SQL Server create collisions in system generated constraint names?

How to make a pipeline wait for end-of-file or stop after an error?

Is there really no use for MD5 anymore?

What is the most expensive material in the world that could be used to create Pun-Pun's lute?

How can I practically buy stocks?

Do I have an "anti-research" personality?

How does a program know if stdout is connected to a terminal or a pipe?

In order to check if a field is required or not, is the result of isNillable method sufficient?

Why does processed meat contain preservatives, while canned fish needs not?

What is Niska's accent?

Unexpected email from Yorkshire Bank

How to reduce LED flash rate (frequency)

How to get a plain text file version of a CP/M .BAS (M-BASIC) program?

How could Tony Stark make this in Endgame?

The Defining Moment

Does Gita support doctrine of eternal cycle of birth and death for evil people?



Find the identical rows in a matrix


How to efficiently find positions of duplicates?How to find rows that have maximum value?How to do equality check of a large matrix and get the corresponding index position?List manipulation: Dropping first or last row or column of a matrixIs it possible to colour only one column of a matrix?Make a vector of sums of matrix rowsHow to operate on spans of rows in a matrix?Efficiently select the smallest magnitude element from each column of a matrixMatrix expansion and reorganisationMatrix using For LoopChanging the position of rows and columns in a matrix













6












$begingroup$


Suppose I have the following matrix:



M = 
{{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};

TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]


matrix



In this case, it turns out that rows (S1, S8), (S2, S3, S4), (S5, S6, S7) have equal element values in identical column positions. I have a 1000 x 1000 matrix to examine and would appreciate any assistance in coding this problem.










share|improve this question











$endgroup$








  • 3




    $begingroup$
    Try Values[PositionIndex[M]]
    $endgroup$
    – Coolwater
    yesterday










  • $begingroup$
    @Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
    $endgroup$
    – Okkes Dulgerci
    yesterday






  • 2




    $begingroup$
    Possible duplicate of How to efficiently find positions of duplicates?
    $endgroup$
    – Michael E2
    yesterday










  • $begingroup$
    @Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.
    $endgroup$
    – Henrik Schumacher
    yesterday
















6












$begingroup$


Suppose I have the following matrix:



M = 
{{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};

TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]


matrix



In this case, it turns out that rows (S1, S8), (S2, S3, S4), (S5, S6, S7) have equal element values in identical column positions. I have a 1000 x 1000 matrix to examine and would appreciate any assistance in coding this problem.










share|improve this question











$endgroup$








  • 3




    $begingroup$
    Try Values[PositionIndex[M]]
    $endgroup$
    – Coolwater
    yesterday










  • $begingroup$
    @Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
    $endgroup$
    – Okkes Dulgerci
    yesterday






  • 2




    $begingroup$
    Possible duplicate of How to efficiently find positions of duplicates?
    $endgroup$
    – Michael E2
    yesterday










  • $begingroup$
    @Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.
    $endgroup$
    – Henrik Schumacher
    yesterday














6












6








6


2



$begingroup$


Suppose I have the following matrix:



M = 
{{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};

TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]


matrix



In this case, it turns out that rows (S1, S8), (S2, S3, S4), (S5, S6, S7) have equal element values in identical column positions. I have a 1000 x 1000 matrix to examine and would appreciate any assistance in coding this problem.










share|improve this question











$endgroup$




Suppose I have the following matrix:



M = 
{{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};

TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]


matrix



In this case, it turns out that rows (S1, S8), (S2, S3, S4), (S5, S6, S7) have equal element values in identical column positions. I have a 1000 x 1000 matrix to examine and would appreciate any assistance in coding this problem.







list-manipulation matrix






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









m_goldberg

89.2k873200




89.2k873200










asked 2 days ago









PRGPRG

1178




1178








  • 3




    $begingroup$
    Try Values[PositionIndex[M]]
    $endgroup$
    – Coolwater
    yesterday










  • $begingroup$
    @Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
    $endgroup$
    – Okkes Dulgerci
    yesterday






  • 2




    $begingroup$
    Possible duplicate of How to efficiently find positions of duplicates?
    $endgroup$
    – Michael E2
    yesterday










  • $begingroup$
    @Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.
    $endgroup$
    – Henrik Schumacher
    yesterday














  • 3




    $begingroup$
    Try Values[PositionIndex[M]]
    $endgroup$
    – Coolwater
    yesterday










  • $begingroup$
    @Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
    $endgroup$
    – Okkes Dulgerci
    yesterday






  • 2




    $begingroup$
    Possible duplicate of How to efficiently find positions of duplicates?
    $endgroup$
    – Michael E2
    yesterday










  • $begingroup$
    @Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.
    $endgroup$
    – Henrik Schumacher
    yesterday








3




3




$begingroup$
Try Values[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday




$begingroup$
Try Values[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday












$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday




$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday




2




2




$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday




$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday












$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.
$endgroup$
– Henrik Schumacher
yesterday




$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.
$endgroup$
– Henrik Schumacher
yesterday










4 Answers
4






active

oldest

votes


















8












$begingroup$

idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]



{{1, 8}, {2, 3, 4}, {5, 6, 7}}




In order to obtain the labels of the rows, you may use the following:



labels = {S1, S2, S3, S4, S5, S6, S7, S8};
Map[labels[[#]] &, idx, {2}]



{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}







share|improve this answer











$endgroup$













  • $begingroup$
    Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
    $endgroup$
    – PRG
    2 days ago










  • $begingroup$
    MANY THANKS, HENRIK!
    $endgroup$
    – PRG
    2 days ago










  • $begingroup$
    YOU'RE WELCOME, PRG! =D
    $endgroup$
    – Henrik Schumacher
    2 days ago



















6












$begingroup$

The function positionDuplicates [] from How to efficiently find positions of duplicates? does the job, faster than Nearest.



(* Henrik's method *)
posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]

SeedRandom[0]; (* to make a reproducible 1000 x 1000 matrix *)
foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];

d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.017, Null} *)

d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.060, Null} *)

d1 == d2
(* True *)

d1
(*
{{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},
{173, 516}, {544, 816}, {610, 720}}
*)





share|improve this answer









$endgroup$









  • 1




    $begingroup$
    Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
    $endgroup$
    – Okkes Dulgerci
    yesterday










  • $begingroup$
    @OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
    $endgroup$
    – Michael E2
    yesterday








  • 3




    $begingroup$
    @OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
    $endgroup$
    – Michael E2
    yesterday



















3












$begingroup$

The following shows an order of magnitude difference between the most efficient ordering solution and the least efficient gatherBy solution for a 1000 x 1000 matrix as required by the OP.



ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];
sort[M_] :=
Sort[#[[All, 2]] & /@
SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];
gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];
nearest[M_] :=
DeleteDuplicates[
Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];
positionIndex[M_] := Values@PositionIndex@M;

SetAttributes[speedTest, HoldAll];

speedTest[functions_, opts : OptionsPattern[]] :=
Function[inp, speedTest[functions, inp, opts], HoldAll]

speedTest[functions_, inp_, OptionsPattern[]] := Module[{
tm = Function[fn,
Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],
timesOutputs, times, sameQ},
timesOutputs = Through[(tm /@ functions)@inp];
times =
SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;
sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);
If[OptionValue@"CheckOutputs",
Labeled[times,
Row[{ToString@Unevaluated@inp, Spacer@80,
If[sameQ, Style["[Checkmark]", Green, 20],
Style["x", Red, 20]]}], Top], times]
];

Options[speedTest] = {"CheckOutputs" -> True};

M1000 = RandomInteger[{0, 1}, {1000, 1000}];

speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000


enter image description here



Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering solution still the fastest but with the sort solution now becoming second fastest.



M10000 = RandomInteger[{0, 1}, {10000, 10000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000


enter image description here






share|improve this answer











$endgroup$





















    1












    $begingroup$

    I'd use GroupBy.



    First the names of the rows: can be anything you like, for example



    rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]



    {S1, S2, S3, S4, S5, S6, S7, S8}




    Next the grouping:



    groups = GroupBy[Thread[rownames -> M], Last -> First]



    <|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
    {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
    {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>




    If all you need are the names:



    Values[groups]



    {{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}







    share|improve this answer









    $endgroup$














      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "387"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197039%2ffind-the-identical-rows-in-a-matrix%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      8












      $begingroup$

      idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]



      {{1, 8}, {2, 3, 4}, {5, 6, 7}}




      In order to obtain the labels of the rows, you may use the following:



      labels = {S1, S2, S3, S4, S5, S6, S7, S8};
      Map[labels[[#]] &, idx, {2}]



      {{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}







      share|improve this answer











      $endgroup$













      • $begingroup$
        Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
        $endgroup$
        – PRG
        2 days ago










      • $begingroup$
        MANY THANKS, HENRIK!
        $endgroup$
        – PRG
        2 days ago










      • $begingroup$
        YOU'RE WELCOME, PRG! =D
        $endgroup$
        – Henrik Schumacher
        2 days ago
















      8












      $begingroup$

      idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]



      {{1, 8}, {2, 3, 4}, {5, 6, 7}}




      In order to obtain the labels of the rows, you may use the following:



      labels = {S1, S2, S3, S4, S5, S6, S7, S8};
      Map[labels[[#]] &, idx, {2}]



      {{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}







      share|improve this answer











      $endgroup$













      • $begingroup$
        Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
        $endgroup$
        – PRG
        2 days ago










      • $begingroup$
        MANY THANKS, HENRIK!
        $endgroup$
        – PRG
        2 days ago










      • $begingroup$
        YOU'RE WELCOME, PRG! =D
        $endgroup$
        – Henrik Schumacher
        2 days ago














      8












      8








      8





      $begingroup$

      idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]



      {{1, 8}, {2, 3, 4}, {5, 6, 7}}




      In order to obtain the labels of the rows, you may use the following:



      labels = {S1, S2, S3, S4, S5, S6, S7, S8};
      Map[labels[[#]] &, idx, {2}]



      {{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}







      share|improve this answer











      $endgroup$



      idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]



      {{1, 8}, {2, 3, 4}, {5, 6, 7}}




      In order to obtain the labels of the rows, you may use the following:



      labels = {S1, S2, S3, S4, S5, S6, S7, S8};
      Map[labels[[#]] &, idx, {2}]



      {{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}








      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 2 days ago

























      answered 2 days ago









      Henrik SchumacherHenrik Schumacher

      61.3k585171




      61.3k585171












      • $begingroup$
        Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
        $endgroup$
        – PRG
        2 days ago










      • $begingroup$
        MANY THANKS, HENRIK!
        $endgroup$
        – PRG
        2 days ago










      • $begingroup$
        YOU'RE WELCOME, PRG! =D
        $endgroup$
        – Henrik Schumacher
        2 days ago


















      • $begingroup$
        Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
        $endgroup$
        – PRG
        2 days ago










      • $begingroup$
        MANY THANKS, HENRIK!
        $endgroup$
        – PRG
        2 days ago










      • $begingroup$
        YOU'RE WELCOME, PRG! =D
        $endgroup$
        – Henrik Schumacher
        2 days ago
















      $begingroup$
      Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
      $endgroup$
      – PRG
      2 days ago




      $begingroup$
      Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
      $endgroup$
      – PRG
      2 days ago












      $begingroup$
      MANY THANKS, HENRIK!
      $endgroup$
      – PRG
      2 days ago




      $begingroup$
      MANY THANKS, HENRIK!
      $endgroup$
      – PRG
      2 days ago












      $begingroup$
      YOU'RE WELCOME, PRG! =D
      $endgroup$
      – Henrik Schumacher
      2 days ago




      $begingroup$
      YOU'RE WELCOME, PRG! =D
      $endgroup$
      – Henrik Schumacher
      2 days ago











      6












      $begingroup$

      The function positionDuplicates [] from How to efficiently find positions of duplicates? does the job, faster than Nearest.



      (* Henrik's method *)
      posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]

      SeedRandom[0]; (* to make a reproducible 1000 x 1000 matrix *)
      foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];

      d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
      (* {0.017, Null} *)

      d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
      (* {0.060, Null} *)

      d1 == d2
      (* True *)

      d1
      (*
      {{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},
      {173, 516}, {544, 816}, {610, 720}}
      *)





      share|improve this answer









      $endgroup$









      • 1




        $begingroup$
        Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
        $endgroup$
        – Okkes Dulgerci
        yesterday










      • $begingroup$
        @OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
        $endgroup$
        – Michael E2
        yesterday








      • 3




        $begingroup$
        @OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
        $endgroup$
        – Michael E2
        yesterday
















      6












      $begingroup$

      The function positionDuplicates [] from How to efficiently find positions of duplicates? does the job, faster than Nearest.



      (* Henrik's method *)
      posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]

      SeedRandom[0]; (* to make a reproducible 1000 x 1000 matrix *)
      foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];

      d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
      (* {0.017, Null} *)

      d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
      (* {0.060, Null} *)

      d1 == d2
      (* True *)

      d1
      (*
      {{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},
      {173, 516}, {544, 816}, {610, 720}}
      *)





      share|improve this answer









      $endgroup$









      • 1




        $begingroup$
        Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
        $endgroup$
        – Okkes Dulgerci
        yesterday










      • $begingroup$
        @OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
        $endgroup$
        – Michael E2
        yesterday








      • 3




        $begingroup$
        @OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
        $endgroup$
        – Michael E2
        yesterday














      6












      6








      6





      $begingroup$

      The function positionDuplicates [] from How to efficiently find positions of duplicates? does the job, faster than Nearest.



      (* Henrik's method *)
      posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]

      SeedRandom[0]; (* to make a reproducible 1000 x 1000 matrix *)
      foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];

      d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
      (* {0.017, Null} *)

      d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
      (* {0.060, Null} *)

      d1 == d2
      (* True *)

      d1
      (*
      {{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},
      {173, 516}, {544, 816}, {610, 720}}
      *)





      share|improve this answer









      $endgroup$



      The function positionDuplicates [] from How to efficiently find positions of duplicates? does the job, faster than Nearest.



      (* Henrik's method *)
      posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]

      SeedRandom[0]; (* to make a reproducible 1000 x 1000 matrix *)
      foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];

      d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
      (* {0.017, Null} *)

      d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
      (* {0.060, Null} *)

      d1 == d2
      (* True *)

      d1
      (*
      {{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},
      {173, 516}, {544, 816}, {610, 720}}
      *)






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered yesterday









      Michael E2Michael E2

      151k12203483




      151k12203483








      • 1




        $begingroup$
        Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
        $endgroup$
        – Okkes Dulgerci
        yesterday










      • $begingroup$
        @OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
        $endgroup$
        – Michael E2
        yesterday








      • 3




        $begingroup$
        @OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
        $endgroup$
        – Michael E2
        yesterday














      • 1




        $begingroup$
        Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
        $endgroup$
        – Okkes Dulgerci
        yesterday










      • $begingroup$
        @OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
        $endgroup$
        – Michael E2
        yesterday








      • 3




        $begingroup$
        @OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
        $endgroup$
        – Michael E2
        yesterday








      1




      1




      $begingroup$
      Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
      $endgroup$
      – Okkes Dulgerci
      yesterday




      $begingroup$
      Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
      $endgroup$
      – Okkes Dulgerci
      yesterday












      $begingroup$
      @OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
      $endgroup$
      – Michael E2
      yesterday






      $begingroup$
      @OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
      $endgroup$
      – Michael E2
      yesterday






      3




      3




      $begingroup$
      @OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
      $endgroup$
      – Michael E2
      yesterday




      $begingroup$
      @OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
      $endgroup$
      – Michael E2
      yesterday











      3












      $begingroup$

      The following shows an order of magnitude difference between the most efficient ordering solution and the least efficient gatherBy solution for a 1000 x 1000 matrix as required by the OP.



      ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];
      sort[M_] :=
      Sort[#[[All, 2]] & /@
      SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];
      gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];
      nearest[M_] :=
      DeleteDuplicates[
      Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];
      positionIndex[M_] := Values@PositionIndex@M;

      SetAttributes[speedTest, HoldAll];

      speedTest[functions_, opts : OptionsPattern[]] :=
      Function[inp, speedTest[functions, inp, opts], HoldAll]

      speedTest[functions_, inp_, OptionsPattern[]] := Module[{
      tm = Function[fn,
      Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],
      timesOutputs, times, sameQ},
      timesOutputs = Through[(tm /@ functions)@inp];
      times =
      SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;
      sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);
      If[OptionValue@"CheckOutputs",
      Labeled[times,
      Row[{ToString@Unevaluated@inp, Spacer@80,
      If[sameQ, Style["[Checkmark]", Green, 20],
      Style["x", Red, 20]]}], Top], times]
      ];

      Options[speedTest] = {"CheckOutputs" -> True};

      M1000 = RandomInteger[{0, 1}, {1000, 1000}];

      speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000


      enter image description here



      Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering solution still the fastest but with the sort solution now becoming second fastest.



      M10000 = RandomInteger[{0, 1}, {10000, 10000}];
      speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000


      enter image description here






      share|improve this answer











      $endgroup$


















        3












        $begingroup$

        The following shows an order of magnitude difference between the most efficient ordering solution and the least efficient gatherBy solution for a 1000 x 1000 matrix as required by the OP.



        ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];
        sort[M_] :=
        Sort[#[[All, 2]] & /@
        SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];
        gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];
        nearest[M_] :=
        DeleteDuplicates[
        Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];
        positionIndex[M_] := Values@PositionIndex@M;

        SetAttributes[speedTest, HoldAll];

        speedTest[functions_, opts : OptionsPattern[]] :=
        Function[inp, speedTest[functions, inp, opts], HoldAll]

        speedTest[functions_, inp_, OptionsPattern[]] := Module[{
        tm = Function[fn,
        Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],
        timesOutputs, times, sameQ},
        timesOutputs = Through[(tm /@ functions)@inp];
        times =
        SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;
        sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);
        If[OptionValue@"CheckOutputs",
        Labeled[times,
        Row[{ToString@Unevaluated@inp, Spacer@80,
        If[sameQ, Style["[Checkmark]", Green, 20],
        Style["x", Red, 20]]}], Top], times]
        ];

        Options[speedTest] = {"CheckOutputs" -> True};

        M1000 = RandomInteger[{0, 1}, {1000, 1000}];

        speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000


        enter image description here



        Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering solution still the fastest but with the sort solution now becoming second fastest.



        M10000 = RandomInteger[{0, 1}, {10000, 10000}];
        speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000


        enter image description here






        share|improve this answer











        $endgroup$
















          3












          3








          3





          $begingroup$

          The following shows an order of magnitude difference between the most efficient ordering solution and the least efficient gatherBy solution for a 1000 x 1000 matrix as required by the OP.



          ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];
          sort[M_] :=
          Sort[#[[All, 2]] & /@
          SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];
          gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];
          nearest[M_] :=
          DeleteDuplicates[
          Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];
          positionIndex[M_] := Values@PositionIndex@M;

          SetAttributes[speedTest, HoldAll];

          speedTest[functions_, opts : OptionsPattern[]] :=
          Function[inp, speedTest[functions, inp, opts], HoldAll]

          speedTest[functions_, inp_, OptionsPattern[]] := Module[{
          tm = Function[fn,
          Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],
          timesOutputs, times, sameQ},
          timesOutputs = Through[(tm /@ functions)@inp];
          times =
          SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;
          sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);
          If[OptionValue@"CheckOutputs",
          Labeled[times,
          Row[{ToString@Unevaluated@inp, Spacer@80,
          If[sameQ, Style["[Checkmark]", Green, 20],
          Style["x", Red, 20]]}], Top], times]
          ];

          Options[speedTest] = {"CheckOutputs" -> True};

          M1000 = RandomInteger[{0, 1}, {1000, 1000}];

          speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000


          enter image description here



          Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering solution still the fastest but with the sort solution now becoming second fastest.



          M10000 = RandomInteger[{0, 1}, {10000, 10000}];
          speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000


          enter image description here






          share|improve this answer











          $endgroup$



          The following shows an order of magnitude difference between the most efficient ordering solution and the least efficient gatherBy solution for a 1000 x 1000 matrix as required by the OP.



          ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];
          sort[M_] :=
          Sort[#[[All, 2]] & /@
          SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];
          gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];
          nearest[M_] :=
          DeleteDuplicates[
          Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];
          positionIndex[M_] := Values@PositionIndex@M;

          SetAttributes[speedTest, HoldAll];

          speedTest[functions_, opts : OptionsPattern[]] :=
          Function[inp, speedTest[functions, inp, opts], HoldAll]

          speedTest[functions_, inp_, OptionsPattern[]] := Module[{
          tm = Function[fn,
          Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],
          timesOutputs, times, sameQ},
          timesOutputs = Through[(tm /@ functions)@inp];
          times =
          SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;
          sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);
          If[OptionValue@"CheckOutputs",
          Labeled[times,
          Row[{ToString@Unevaluated@inp, Spacer@80,
          If[sameQ, Style["[Checkmark]", Green, 20],
          Style["x", Red, 20]]}], Top], times]
          ];

          Options[speedTest] = {"CheckOutputs" -> True};

          M1000 = RandomInteger[{0, 1}, {1000, 1000}];

          speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000


          enter image description here



          Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering solution still the fastest but with the sort solution now becoming second fastest.



          M10000 = RandomInteger[{0, 1}, {10000, 10000}];
          speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000


          enter image description here







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered yesterday









          Ronald MonsonRonald Monson

          3,2131734




          3,2131734























              1












              $begingroup$

              I'd use GroupBy.



              First the names of the rows: can be anything you like, for example



              rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]



              {S1, S2, S3, S4, S5, S6, S7, S8}




              Next the grouping:



              groups = GroupBy[Thread[rownames -> M], Last -> First]



              <|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
              {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
              {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>




              If all you need are the names:



              Values[groups]



              {{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}







              share|improve this answer









              $endgroup$


















                1












                $begingroup$

                I'd use GroupBy.



                First the names of the rows: can be anything you like, for example



                rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]



                {S1, S2, S3, S4, S5, S6, S7, S8}




                Next the grouping:



                groups = GroupBy[Thread[rownames -> M], Last -> First]



                <|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
                {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
                {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>




                If all you need are the names:



                Values[groups]



                {{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}







                share|improve this answer









                $endgroup$
















                  1












                  1








                  1





                  $begingroup$

                  I'd use GroupBy.



                  First the names of the rows: can be anything you like, for example



                  rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]



                  {S1, S2, S3, S4, S5, S6, S7, S8}




                  Next the grouping:



                  groups = GroupBy[Thread[rownames -> M], Last -> First]



                  <|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
                  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
                  {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>




                  If all you need are the names:



                  Values[groups]



                  {{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}







                  share|improve this answer









                  $endgroup$



                  I'd use GroupBy.



                  First the names of the rows: can be anything you like, for example



                  rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]



                  {S1, S2, S3, S4, S5, S6, S7, S8}




                  Next the grouping:



                  groups = GroupBy[Thread[rownames -> M], Last -> First]



                  <|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
                  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
                  {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>




                  If all you need are the names:



                  Values[groups]



                  {{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}








                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered yesterday









                  RomanRoman

                  6,44611134




                  6,44611134






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Mathematica Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197039%2ffind-the-identical-rows-in-a-matrix%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      VNC viewer RFB protocol error: bad desktop size 0x0I Cannot Type the Key 'd' (lowercase) in VNC Viewer...

                      Tribunal Administrativo e Fiscal de Mirandela Referências Menu de...

                      looking for continuous Screen Capture for retroactivly reproducing errors, timeback machineRolling desktop...