Find the identical rows in a matrixHow to efficiently find positions of duplicates?How to find rows that have...

How can I place the product on a social media post better?

What happened to Captain America in Endgame?

How to pronounce 'C++' in Spanish

Symbolic Multivariate Distribution

Was there a shared-world project before "Thieves World"?

Normal Map bad shading in Rendered display

Pulling the rope with one hand is as heavy as with two hands?

Does a strong solution to a SDE imply lipschitz condition?

Can SQL Server create collisions in system generated constraint names?

How to make a pipeline wait for end-of-file or stop after an error?

Is there really no use for MD5 anymore?

What is the most expensive material in the world that could be used to create Pun-Pun's lute?

How can I practically buy stocks?

Do I have an "anti-research" personality?

How does a program know if stdout is connected to a terminal or a pipe?

In order to check if a field is required or not, is the result of isNillable method sufficient?

Why does processed meat contain preservatives, while canned fish needs not?

What is Niska's accent?

Unexpected email from Yorkshire Bank

How to reduce LED flash rate (frequency)

How to get a plain text file version of a CP/M .BAS (M-BASIC) program?

How could Tony Stark make this in Endgame?

The Defining Moment

Does Gita support doctrine of eternal cycle of birth and death for evil people?

Find the identical rows in a matrix

How to efficiently find positions of duplicates?How to find rows that have maximum value?How to do equality check of a large matrix and get the corresponding index position?List manipulation: Dropping first or last row or column of a matrixIs it possible to colour only one column of a matrix?Make a vector of sums of matrix rowsHow to operate on spans of rows in a matrix?Efficiently select the smallest magnitude element from each column of a matrixMatrix expansion and reorganisationMatrix using For LoopChanging the position of rows and columns in a matrix

Suppose I have the following matrix:

M = 

 {{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}}; 



TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]

matrix

In this case, it turns out that rows (S1, S8), (S2, S3, S4), (S5, S6, S7) have equal element values in identical column positions. I have a 1000 x 1000 matrix to examine and would appreciate any assistance in coding this problem.

edited 2 days ago

m_goldberg

89.2k873200

asked 2 days ago

PRG

1178

3

$begingroup$
Try Values[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday

$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday

2

$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday

$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.
$endgroup$
– Henrik Schumacher
yesterday

add a comment |

Suppose I have the following matrix:

M = 

 {{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}}; 



TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]

matrix

edited 2 days ago

m_goldberg

89.2k873200

asked 2 days ago

PRG

1178

3

$begingroup$
Try Values[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday

$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday

2

$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday

$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.
$endgroup$
– Henrik Schumacher
yesterday

add a comment |

Suppose I have the following matrix:

M = 

 {{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}}; 



TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]

matrix

edited 2 days ago

m_goldberg

89.2k873200

asked 2 days ago

PRG

1178

Suppose I have the following matrix:

M = 

 {{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0}, 

  {0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}}; 



TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]

matrix

list-manipulation matrix

edited 2 days ago

m_goldberg

89.2k873200

asked 2 days ago

PRG

1178

edited 2 days ago

m_goldberg

89.2k873200

asked 2 days ago

PRG

1178

edited 2 days ago

m_goldberg

89.2k873200

edited 2 days ago

m_goldberg

89.2k873200

edited 2 days ago

m_goldberg

89.2k873200

asked 2 days ago

PRG

1178

asked 2 days ago

PRG

1178

asked 2 days ago

PRG

1178

3

$begingroup$
Try Values[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday

$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday

2

$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday

$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.
$endgroup$
– Henrik Schumacher
yesterday

add a comment |

3

$begingroup$
Try Values[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday

$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday

2

$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday

$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.
$endgroup$
– Henrik Schumacher
yesterday

Try Values[PositionIndex[M]]

– Coolwater
yesterday

@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1

– Okkes Dulgerci
yesterday

Possible duplicate of How to efficiently find positions of duplicates?

– Michael E2
yesterday

@Coolwater IMHO, the best answer is lacking so far. Please consider posting PositionIndex as possible solution.

– Henrik Schumacher
yesterday

add a comment |

4 Answers
4

active

oldest

votes

idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]

{{1, 8}, {2, 3, 4}, {5, 6, 7}}

In order to obtain the labels of the rows, you may use the following:

labels = {S1, S2, S3, S4, S5, S6, S7, S8};

Map[labels[[#]] &, idx, {2}]

{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}

edited 2 days ago

answered 2 days ago

Henrik Schumacher

61.3k585171

$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago

$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago

$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago

add a comment |

The function positionDuplicates [] from How to efficiently find positions of duplicates? does the job, faster than Nearest.

(* Henrik's method *)

posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]



SeedRandom[0];  (* to make a reproducible 1000 x 1000 matrix *)

foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];



d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming

(*  {0.017, Null}  *)



d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming

(*  {0.060, Null}  *)



d1 == d2

(*  True  *)



d1

(*

  {{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},

   {173, 516}, {544, 816}, {610, 720}}

*)

answered yesterday

Michael E2

151k12203483

1

$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday

$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
$endgroup$
– Michael E2
yesterday

3

$begingroup$
@OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
$endgroup$
– Michael E2
yesterday

add a comment |

The following shows an order of magnitude difference between the most efficient ordering solution and the least efficient gatherBy solution for a 1000 x 1000 matrix as required by the OP.

ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];

sort[M_] := 

  Sort[#[[All, 2]] & /@ 

    SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];

gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];

nearest[M_] := 

  DeleteDuplicates[

   Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];

positionIndex[M_] := Values@PositionIndex@M;



SetAttributes[speedTest, HoldAll];



speedTest[functions_, opts : OptionsPattern[]] := 

 Function[inp, speedTest[functions, inp, opts], HoldAll]



speedTest[functions_, inp_, OptionsPattern[]] := Module[{

    tm = Function[fn, 

      Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],

    timesOutputs, times, sameQ},

   timesOutputs = Through[(tm /@ functions)@inp];

   times = 

    SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;

   sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);

   If[OptionValue@"CheckOutputs", 

    Labeled[times, 

     Row[{ToString@Unevaluated@inp, Spacer@80, 

       If[sameQ, Style["[Checkmark]", Green, 20], 

        Style["x", Red, 20]]}], Top], times]

   ];



Options[speedTest] = {"CheckOutputs" -> True};



M1000 = RandomInteger[{0, 1}, {1000, 1000}];



speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000

enter image description here

Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering solution still the fastest but with the sort solution now becoming second fastest.

M10000 = RandomInteger[{0, 1}, {10000, 10000}];

speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000

enter image description here

edited yesterday

answered yesterday

Ronald Monson

3,2131734

add a comment |

I'd use GroupBy.

First the names of the rows: can be anything you like, for example

rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]

{S1, S2, S3, S4, S5, S6, S7, S8}

Next the grouping:

groups = GroupBy[Thread[rownames -> M], Last -> First]

<|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>

If all you need are the names:

Values[groups]

{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}

answered yesterday

Roman

6,44611134

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "387"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197039%2ffind-the-identical-rows-in-a-matrix%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]

{{1, 8}, {2, 3, 4}, {5, 6, 7}}

In order to obtain the labels of the rows, you may use the following:

labels = {S1, S2, S3, S4, S5, S6, S7, S8};

Map[labels[[#]] &, idx, {2}]

{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}

edited 2 days ago

answered 2 days ago

Henrik Schumacher

61.3k585171

$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago

$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago

$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago

add a comment |

idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]

{{1, 8}, {2, 3, 4}, {5, 6, 7}}

In order to obtain the labels of the rows, you may use the following:

labels = {S1, S2, S3, S4, S5, S6, S7, S8};

Map[labels[[#]] &, idx, {2}]

{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}

edited 2 days ago

answered 2 days ago

Henrik Schumacher

61.3k585171

$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago

$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago

$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago

add a comment |

idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]

{{1, 8}, {2, 3, 4}, {5, 6, 7}}

In order to obtain the labels of the rows, you may use the following:

labels = {S1, S2, S3, S4, S5, S6, S7, S8};

Map[labels[[#]] &, idx, {2}]

{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}

edited 2 days ago

answered 2 days ago

Henrik Schumacher

61.3k585171

idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]

{{1, 8}, {2, 3, 4}, {5, 6, 7}}

In order to obtain the labels of the rows, you may use the following:

labels = {S1, S2, S3, S4, S5, S6, S7, S8};

Map[labels[[#]] &, idx, {2}]

{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}

edited 2 days ago

answered 2 days ago

Henrik Schumacher

61.3k585171

edited 2 days ago

answered 2 days ago

Henrik Schumacher

61.3k585171

answered 2 days ago

Henrik Schumacher

61.3k585171

answered 2 days ago

Henrik Schumacher

61.3k585171

$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago

$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago

$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago

add a comment |

$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago

$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago

$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago

Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?

– PRG
2 days ago

MANY THANKS, HENRIK!

– PRG
2 days ago

YOU'RE WELCOME, PRG! =D

– Henrik Schumacher
2 days ago

add a comment |

The function positionDuplicates [] from How to efficiently find positions of duplicates? does the job, faster than Nearest.

(* Henrik's method *)

posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]



SeedRandom[0];  (* to make a reproducible 1000 x 1000 matrix *)

foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];



d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming

(*  {0.017, Null}  *)



d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming

(*  {0.060, Null}  *)



d1 == d2

(*  True  *)



d1

(*

  {{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},

   {173, 516}, {544, 816}, {610, 720}}

*)

answered yesterday

Michael E2

151k12203483

1

$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday

$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
$endgroup$
– Michael E2
yesterday

3

$begingroup$
@OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
$endgroup$
– Michael E2
yesterday

add a comment |

The function positionDuplicates [] from How to efficiently find positions of duplicates? does the job, faster than Nearest.

(* Henrik's method *)

posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]



SeedRandom[0];  (* to make a reproducible 1000 x 1000 matrix *)

foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];



d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming

(*  {0.017, Null}  *)



d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming

(*  {0.060, Null}  *)



d1 == d2

(*  True  *)



d1

(*

  {{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},

   {173, 516}, {544, 816}, {610, 720}}

*)

answered yesterday

Michael E2

151k12203483

1

$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday

$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
$endgroup$
– Michael E2
yesterday

3

$begingroup$
@OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
$endgroup$
– Michael E2
yesterday

add a comment |

The function positionDuplicates [] from How to efficiently find positions of duplicates? does the job, faster than Nearest.

(* Henrik's method *)

posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]



SeedRandom[0];  (* to make a reproducible 1000 x 1000 matrix *)

foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];



d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming

(*  {0.017, Null}  *)



d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming

(*  {0.060, Null}  *)



d1 == d2

(*  True  *)



d1

(*

  {{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},

   {173, 516}, {544, 816}, {610, 720}}

*)

answered yesterday

Michael E2

151k12203483

The function positionDuplicates [] from How to efficiently find positions of duplicates? does the job, faster than Nearest.

(* Henrik's method *)

posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]



SeedRandom[0];  (* to make a reproducible 1000 x 1000 matrix *)

foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];



d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming

(*  {0.017, Null}  *)



d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming

(*  {0.060, Null}  *)



d1 == d2

(*  True  *)



d1

(*

  {{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},

   {173, 516}, {544, 816}, {610, 720}}

*)

answered yesterday

Michael E2

151k12203483

answered yesterday

Michael E2

151k12203483

answered yesterday

Michael E2

151k12203483

answered yesterday

Michael E2

151k12203483

1

$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday

$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
$endgroup$
– Michael E2
yesterday

3

$begingroup$
@OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
$endgroup$
– Michael E2
yesterday

add a comment |

1

$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday

$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.
$endgroup$
– Michael E2
yesterday

3

$begingroup$
@OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.
$endgroup$
– Michael E2
yesterday

Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1] is faster than positionDuplicates []

– Okkes Dulgerci
yesterday

@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the PositionIndex[] solutoin.

– Michael E2
yesterday

@OkkesDulgerci It's interesting that PositionIndex[] outperforms positionDuplicates[] on a list of lists, because it is still much slower on a list of integers.

– Michael E2
yesterday

add a comment |

The following shows an order of magnitude difference between the most efficient ordering solution and the least efficient gatherBy solution for a 1000 x 1000 matrix as required by the OP.

ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];

sort[M_] := 

  Sort[#[[All, 2]] & /@ 

    SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];

gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];

nearest[M_] := 

  DeleteDuplicates[

   Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];

positionIndex[M_] := Values@PositionIndex@M;



SetAttributes[speedTest, HoldAll];



speedTest[functions_, opts : OptionsPattern[]] := 

 Function[inp, speedTest[functions, inp, opts], HoldAll]



speedTest[functions_, inp_, OptionsPattern[]] := Module[{

    tm = Function[fn, 

      Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],

    timesOutputs, times, sameQ},

   timesOutputs = Through[(tm /@ functions)@inp];

   times = 

    SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;

   sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);

   If[OptionValue@"CheckOutputs", 

    Labeled[times, 

     Row[{ToString@Unevaluated@inp, Spacer@80, 

       If[sameQ, Style["[Checkmark]", Green, 20], 

        Style["x", Red, 20]]}], Top], times]

   ];



Options[speedTest] = {"CheckOutputs" -> True};



M1000 = RandomInteger[{0, 1}, {1000, 1000}];



speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000

enter image description here

M10000 = RandomInteger[{0, 1}, {10000, 10000}];

speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000

enter image description here

edited yesterday

answered yesterday

Ronald Monson

3,2131734

add a comment |

The following shows an order of magnitude difference between the most efficient ordering solution and the least efficient gatherBy solution for a 1000 x 1000 matrix as required by the OP.

ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];

sort[M_] := 

  Sort[#[[All, 2]] & /@ 

    SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];

gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];

nearest[M_] := 

  DeleteDuplicates[

   Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];

positionIndex[M_] := Values@PositionIndex@M;



SetAttributes[speedTest, HoldAll];



speedTest[functions_, opts : OptionsPattern[]] := 

 Function[inp, speedTest[functions, inp, opts], HoldAll]



speedTest[functions_, inp_, OptionsPattern[]] := Module[{

    tm = Function[fn, 

      Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],

    timesOutputs, times, sameQ},

   timesOutputs = Through[(tm /@ functions)@inp];

   times = 

    SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;

   sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);

   If[OptionValue@"CheckOutputs", 

    Labeled[times, 

     Row[{ToString@Unevaluated@inp, Spacer@80, 

       If[sameQ, Style["[Checkmark]", Green, 20], 

        Style["x", Red, 20]]}], Top], times]

   ];



Options[speedTest] = {"CheckOutputs" -> True};



M1000 = RandomInteger[{0, 1}, {1000, 1000}];



speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000

enter image description here

M10000 = RandomInteger[{0, 1}, {10000, 10000}];

speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000

enter image description here

edited yesterday

answered yesterday

Ronald Monson

3,2131734

add a comment |

The following shows an order of magnitude difference between the most efficient ordering solution and the least efficient gatherBy solution for a 1000 x 1000 matrix as required by the OP.

ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];

sort[M_] := 

  Sort[#[[All, 2]] & /@ 

    SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];

gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];

nearest[M_] := 

  DeleteDuplicates[

   Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];

positionIndex[M_] := Values@PositionIndex@M;



SetAttributes[speedTest, HoldAll];



speedTest[functions_, opts : OptionsPattern[]] := 

 Function[inp, speedTest[functions, inp, opts], HoldAll]



speedTest[functions_, inp_, OptionsPattern[]] := Module[{

    tm = Function[fn, 

      Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],

    timesOutputs, times, sameQ},

   timesOutputs = Through[(tm /@ functions)@inp];

   times = 

    SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;

   sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);

   If[OptionValue@"CheckOutputs", 

    Labeled[times, 

     Row[{ToString@Unevaluated@inp, Spacer@80, 

       If[sameQ, Style["[Checkmark]", Green, 20], 

        Style["x", Red, 20]]}], Top], times]

   ];



Options[speedTest] = {"CheckOutputs" -> True};



M1000 = RandomInteger[{0, 1}, {1000, 1000}];



speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000

enter image description here

M10000 = RandomInteger[{0, 1}, {10000, 10000}];

speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000

enter image description here

edited yesterday

answered yesterday

Ronald Monson

3,2131734

The following shows an order of magnitude difference between the most efficient ordering solution and the least efficient gatherBy solution for a 1000 x 1000 matrix as required by the OP.

ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];

sort[M_] := 

  Sort[#[[All, 2]] & /@ 

    SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];

gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];

nearest[M_] := 

  DeleteDuplicates[

   Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];

positionIndex[M_] := Values@PositionIndex@M;



SetAttributes[speedTest, HoldAll];



speedTest[functions_, opts : OptionsPattern[]] := 

 Function[inp, speedTest[functions, inp, opts], HoldAll]



speedTest[functions_, inp_, OptionsPattern[]] := Module[{

    tm = Function[fn, 

      Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],

    timesOutputs, times, sameQ},

   timesOutputs = Through[(tm /@ functions)@inp];

   times = 

    SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;

   sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);

   If[OptionValue@"CheckOutputs", 

    Labeled[times, 

     Row[{ToString@Unevaluated@inp, Spacer@80, 

       If[sameQ, Style["[Checkmark]", Green, 20], 

        Style["x", Red, 20]]}], Top], times]

   ];



Options[speedTest] = {"CheckOutputs" -> True};



M1000 = RandomInteger[{0, 1}, {1000, 1000}];



speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000

enter image description here

M10000 = RandomInteger[{0, 1}, {10000, 10000}];

speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000

enter image description here

edited yesterday

answered yesterday

Ronald Monson

3,2131734

edited yesterday

answered yesterday

Ronald Monson

3,2131734

answered yesterday

Ronald Monson

3,2131734

answered yesterday

Ronald Monson

3,2131734

add a comment |

I'd use GroupBy.

First the names of the rows: can be anything you like, for example

rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]

{S1, S2, S3, S4, S5, S6, S7, S8}

Next the grouping:

groups = GroupBy[Thread[rownames -> M], Last -> First]

<|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>

If all you need are the names:

Values[groups]

{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}

answered yesterday

Roman

6,44611134

add a comment |

I'd use GroupBy.

First the names of the rows: can be anything you like, for example

rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]

{S1, S2, S3, S4, S5, S6, S7, S8}

Next the grouping:

groups = GroupBy[Thread[rownames -> M], Last -> First]

<|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>

If all you need are the names:

Values[groups]

{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}

answered yesterday

Roman

6,44611134

add a comment |

I'd use GroupBy.

First the names of the rows: can be anything you like, for example

rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]

{S1, S2, S3, S4, S5, S6, S7, S8}

Next the grouping:

groups = GroupBy[Thread[rownames -> M], Last -> First]

<|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>

If all you need are the names:

Values[groups]

{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}

answered yesterday

Roman

6,44611134

I'd use GroupBy.

First the names of the rows: can be anything you like, for example

rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]

{S1, S2, S3, S4, S5, S6, S7, S8}

Next the grouping:

groups = GroupBy[Thread[rownames -> M], Last -> First]

<|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>

If all you need are the names:

Values[groups]

{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}

answered yesterday

Roman

6,44611134

answered yesterday

Roman

6,44611134

answered yesterday

Roman

6,44611134

answered yesterday

Roman

6,44611134

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematica Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tjyylli