Find the identical rows in a matrixHow to efficiently find positions of duplicates?How to find rows that have...
How can I place the product on a social media post better?
What happened to Captain America in Endgame?
How to pronounce 'C++' in Spanish
Symbolic Multivariate Distribution
Was there a shared-world project before "Thieves World"?
Normal Map bad shading in Rendered display
Pulling the rope with one hand is as heavy as with two hands?
Does a strong solution to a SDE imply lipschitz condition?
Can SQL Server create collisions in system generated constraint names?
How to make a pipeline wait for end-of-file or stop after an error?
Is there really no use for MD5 anymore?
What is the most expensive material in the world that could be used to create Pun-Pun's lute?
How can I practically buy stocks?
Do I have an "anti-research" personality?
How does a program know if stdout is connected to a terminal or a pipe?
In order to check if a field is required or not, is the result of isNillable method sufficient?
Why does processed meat contain preservatives, while canned fish needs not?
What is Niska's accent?
Unexpected email from Yorkshire Bank
How to reduce LED flash rate (frequency)
How to get a plain text file version of a CP/M .BAS (M-BASIC) program?
How could Tony Stark make this in Endgame?
The Defining Moment
Does Gita support doctrine of eternal cycle of birth and death for evil people?
Find the identical rows in a matrix
How to efficiently find positions of duplicates?How to find rows that have maximum value?How to do equality check of a large matrix and get the corresponding index position?List manipulation: Dropping first or last row or column of a matrixIs it possible to colour only one column of a matrix?Make a vector of sums of matrix rowsHow to operate on spans of rows in a matrix?Efficiently select the smallest magnitude element from each column of a matrixMatrix expansion and reorganisationMatrix using For LoopChanging the position of rows and columns in a matrix
$begingroup$
Suppose I have the following matrix:
M =
{{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};
TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]
In this case, it turns out that rows (S1, S8), (S2, S3, S4), (S5, S6, S7) have equal element values in identical column positions. I have a 1000 x 1000 matrix to examine and would appreciate any assistance in coding this problem.
list-manipulation matrix
$endgroup$
add a comment |
$begingroup$
Suppose I have the following matrix:
M =
{{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};
TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]
In this case, it turns out that rows (S1, S8), (S2, S3, S4), (S5, S6, S7) have equal element values in identical column positions. I have a 1000 x 1000 matrix to examine and would appreciate any assistance in coding this problem.
list-manipulation matrix
$endgroup$
3
$begingroup$
TryValues[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday
$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday
2
$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday
$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider postingPositionIndex
as possible solution.
$endgroup$
– Henrik Schumacher
yesterday
add a comment |
$begingroup$
Suppose I have the following matrix:
M =
{{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};
TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]
In this case, it turns out that rows (S1, S8), (S2, S3, S4), (S5, S6, S7) have equal element values in identical column positions. I have a 1000 x 1000 matrix to examine and would appreciate any assistance in coding this problem.
list-manipulation matrix
$endgroup$
Suppose I have the following matrix:
M =
{{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};
TableForm[M, TableHeadings -> {{S1, S2, S3, S4, S5, S6, S7, S8}}]
In this case, it turns out that rows (S1, S8), (S2, S3, S4), (S5, S6, S7) have equal element values in identical column positions. I have a 1000 x 1000 matrix to examine and would appreciate any assistance in coding this problem.
list-manipulation matrix
list-manipulation matrix
edited 2 days ago
m_goldberg
89.2k873200
89.2k873200
asked 2 days ago
PRGPRG
1178
1178
3
$begingroup$
TryValues[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday
$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday
2
$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday
$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider postingPositionIndex
as possible solution.
$endgroup$
– Henrik Schumacher
yesterday
add a comment |
3
$begingroup$
TryValues[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday
$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday
2
$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday
$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider postingPositionIndex
as possible solution.
$endgroup$
– Henrik Schumacher
yesterday
3
3
$begingroup$
Try
Values[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday
$begingroup$
Try
Values[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday
$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday
$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday
2
2
$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday
$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday
$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider posting
PositionIndex
as possible solution.$endgroup$
– Henrik Schumacher
yesterday
$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider posting
PositionIndex
as possible solution.$endgroup$
– Henrik Schumacher
yesterday
add a comment |
4 Answers
4
active
oldest
votes
$begingroup$
idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]
{{1, 8}, {2, 3, 4}, {5, 6, 7}}
In order to obtain the labels of the rows, you may use the following:
labels = {S1, S2, S3, S4, S5, S6, S7, S8};
Map[labels[[#]] &, idx, {2}]
{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}
$endgroup$
$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago
$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago
$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago
add a comment |
$begingroup$
The function positionDuplicates []
from How to efficiently find positions of duplicates? does the job, faster than Nearest
.
(* Henrik's method *)
posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]
SeedRandom[0]; (* to make a reproducible 1000 x 1000 matrix *)
foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];
d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.017, Null} *)
d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.060, Null} *)
d1 == d2
(* True *)
d1
(*
{{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},
{173, 516}, {544, 816}, {610, 720}}
*)
$endgroup$
1
$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1]
is faster thanpositionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday
$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for thePositionIndex[]
solutoin.
$endgroup$
– Michael E2
yesterday
3
$begingroup$
@OkkesDulgerci It's interesting thatPositionIndex[]
outperformspositionDuplicates[]
on a list of lists, because it is still much slower on a list of integers.
$endgroup$
– Michael E2
yesterday
add a comment |
$begingroup$
The following shows an order of magnitude difference between the most efficient ordering
solution and the least efficient gatherBy
solution for a 1000 x 1000 matrix as required by the OP.
ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];
sort[M_] :=
Sort[#[[All, 2]] & /@
SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];
gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];
nearest[M_] :=
DeleteDuplicates[
Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];
positionIndex[M_] := Values@PositionIndex@M;
SetAttributes[speedTest, HoldAll];
speedTest[functions_, opts : OptionsPattern[]] :=
Function[inp, speedTest[functions, inp, opts], HoldAll]
speedTest[functions_, inp_, OptionsPattern[]] := Module[{
tm = Function[fn,
Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],
timesOutputs, times, sameQ},
timesOutputs = Through[(tm /@ functions)@inp];
times =
SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;
sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);
If[OptionValue@"CheckOutputs",
Labeled[times,
Row[{ToString@Unevaluated@inp, Spacer@80,
If[sameQ, Style["[Checkmark]", Green, 20],
Style["x", Red, 20]]}], Top], times]
];
Options[speedTest] = {"CheckOutputs" -> True};
M1000 = RandomInteger[{0, 1}, {1000, 1000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000
Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering
solution still the fastest but with the sort
solution now becoming second fastest.
M10000 = RandomInteger[{0, 1}, {10000, 10000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000
$endgroup$
add a comment |
$begingroup$
I'd use GroupBy
.
First the names of the rows: can be anything you like, for example
rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]
{S1, S2, S3, S4, S5, S6, S7, S8}
Next the grouping:
groups = GroupBy[Thread[rownames -> M], Last -> First]
<|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>
If all you need are the names:
Values[groups]
{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "387"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197039%2ffind-the-identical-rows-in-a-matrix%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]
{{1, 8}, {2, 3, 4}, {5, 6, 7}}
In order to obtain the labels of the rows, you may use the following:
labels = {S1, S2, S3, S4, S5, S6, S7, S8};
Map[labels[[#]] &, idx, {2}]
{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}
$endgroup$
$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago
$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago
$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago
add a comment |
$begingroup$
idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]
{{1, 8}, {2, 3, 4}, {5, 6, 7}}
In order to obtain the labels of the rows, you may use the following:
labels = {S1, S2, S3, S4, S5, S6, S7, S8};
Map[labels[[#]] &, idx, {2}]
{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}
$endgroup$
$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago
$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago
$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago
add a comment |
$begingroup$
idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]
{{1, 8}, {2, 3, 4}, {5, 6, 7}}
In order to obtain the labels of the rows, you may use the following:
labels = {S1, S2, S3, S4, S5, S6, S7, S8};
Map[labels[[#]] &, idx, {2}]
{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}
$endgroup$
idx = DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]
{{1, 8}, {2, 3, 4}, {5, 6, 7}}
In order to obtain the labels of the rows, you may use the following:
labels = {S1, S2, S3, S4, S5, S6, S7, S8};
Map[labels[[#]] &, idx, {2}]
{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}
edited 2 days ago
answered 2 days ago
Henrik SchumacherHenrik Schumacher
61.3k585171
61.3k585171
$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago
$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago
$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago
add a comment |
$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago
$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago
$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago
$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago
$begingroup$
Henrik: Can I add the S in front of the result; e.g., (S1,S8),(S3,S4),(S5,S6,S7)?
$endgroup$
– PRG
2 days ago
$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago
$begingroup$
MANY THANKS, HENRIK!
$endgroup$
– PRG
2 days ago
$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago
$begingroup$
YOU'RE WELCOME, PRG! =D
$endgroup$
– Henrik Schumacher
2 days ago
add a comment |
$begingroup$
The function positionDuplicates []
from How to efficiently find positions of duplicates? does the job, faster than Nearest
.
(* Henrik's method *)
posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]
SeedRandom[0]; (* to make a reproducible 1000 x 1000 matrix *)
foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];
d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.017, Null} *)
d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.060, Null} *)
d1 == d2
(* True *)
d1
(*
{{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},
{173, 516}, {544, 816}, {610, 720}}
*)
$endgroup$
1
$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1]
is faster thanpositionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday
$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for thePositionIndex[]
solutoin.
$endgroup$
– Michael E2
yesterday
3
$begingroup$
@OkkesDulgerci It's interesting thatPositionIndex[]
outperformspositionDuplicates[]
on a list of lists, because it is still much slower on a list of integers.
$endgroup$
– Michael E2
yesterday
add a comment |
$begingroup$
The function positionDuplicates []
from How to efficiently find positions of duplicates? does the job, faster than Nearest
.
(* Henrik's method *)
posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]
SeedRandom[0]; (* to make a reproducible 1000 x 1000 matrix *)
foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];
d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.017, Null} *)
d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.060, Null} *)
d1 == d2
(* True *)
d1
(*
{{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},
{173, 516}, {544, 816}, {610, 720}}
*)
$endgroup$
1
$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1]
is faster thanpositionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday
$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for thePositionIndex[]
solutoin.
$endgroup$
– Michael E2
yesterday
3
$begingroup$
@OkkesDulgerci It's interesting thatPositionIndex[]
outperformspositionDuplicates[]
on a list of lists, because it is still much slower on a list of integers.
$endgroup$
– Michael E2
yesterday
add a comment |
$begingroup$
The function positionDuplicates []
from How to efficiently find positions of duplicates? does the job, faster than Nearest
.
(* Henrik's method *)
posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]
SeedRandom[0]; (* to make a reproducible 1000 x 1000 matrix *)
foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];
d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.017, Null} *)
d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.060, Null} *)
d1 == d2
(* True *)
d1
(*
{{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},
{173, 516}, {544, 816}, {610, 720}}
*)
$endgroup$
The function positionDuplicates []
from How to efficiently find positions of duplicates? does the job, faster than Nearest
.
(* Henrik's method *)
posDupes[M_] := DeleteDuplicates[Sort /@ Nearest[M -> Automatic, M, {∞, 0}]]
SeedRandom[0]; (* to make a reproducible 1000 x 1000 matrix *)
foo = Nest[RandomInteger[1, {1000, 1000}] # &, 1, 9];
d1 = Cases[positionDuplicates[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.017, Null} *)
d2 = Cases[posDupes[foo], dupe_ /; Length[dupe] > 1]; // RepeatedTiming
(* {0.060, Null} *)
d1 == d2
(* True *)
d1
(*
{{25, 75, 291, 355, 356, 425, 475, 518, 547, 668, 670, 750, 777},
{173, 516}, {544, 816}, {610, 720}}
*)
answered yesterday
Michael E2Michael E2
151k12203483
151k12203483
1
$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1]
is faster thanpositionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday
$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for thePositionIndex[]
solutoin.
$endgroup$
– Michael E2
yesterday
3
$begingroup$
@OkkesDulgerci It's interesting thatPositionIndex[]
outperformspositionDuplicates[]
on a list of lists, because it is still much slower on a list of integers.
$endgroup$
– Michael E2
yesterday
add a comment |
1
$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1]
is faster thanpositionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday
$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for thePositionIndex[]
solutoin.
$endgroup$
– Michael E2
yesterday
3
$begingroup$
@OkkesDulgerci It's interesting thatPositionIndex[]
outperformspositionDuplicates[]
on a list of lists, because it is still much slower on a list of integers.
$endgroup$
– Michael E2
yesterday
1
1
$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1]
is faster than positionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday
$begingroup$
Cases[Values[PositionIndex[M]], dupe_ /; Length[dupe] > 1]
is faster than positionDuplicates []
$endgroup$
– Okkes Dulgerci
yesterday
$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the
PositionIndex[]
solutoin.$endgroup$
– Michael E2
yesterday
$begingroup$
@OkkesDulgerci Yes, it is for me, too, in V12. My main point is that the solution to this question has been given in another Q&A. See this answer for the
PositionIndex[]
solutoin.$endgroup$
– Michael E2
yesterday
3
3
$begingroup$
@OkkesDulgerci It's interesting that
PositionIndex[]
outperforms positionDuplicates[]
on a list of lists, because it is still much slower on a list of integers.$endgroup$
– Michael E2
yesterday
$begingroup$
@OkkesDulgerci It's interesting that
PositionIndex[]
outperforms positionDuplicates[]
on a list of lists, because it is still much slower on a list of integers.$endgroup$
– Michael E2
yesterday
add a comment |
$begingroup$
The following shows an order of magnitude difference between the most efficient ordering
solution and the least efficient gatherBy
solution for a 1000 x 1000 matrix as required by the OP.
ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];
sort[M_] :=
Sort[#[[All, 2]] & /@
SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];
gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];
nearest[M_] :=
DeleteDuplicates[
Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];
positionIndex[M_] := Values@PositionIndex@M;
SetAttributes[speedTest, HoldAll];
speedTest[functions_, opts : OptionsPattern[]] :=
Function[inp, speedTest[functions, inp, opts], HoldAll]
speedTest[functions_, inp_, OptionsPattern[]] := Module[{
tm = Function[fn,
Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],
timesOutputs, times, sameQ},
timesOutputs = Through[(tm /@ functions)@inp];
times =
SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;
sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);
If[OptionValue@"CheckOutputs",
Labeled[times,
Row[{ToString@Unevaluated@inp, Spacer@80,
If[sameQ, Style["[Checkmark]", Green, 20],
Style["x", Red, 20]]}], Top], times]
];
Options[speedTest] = {"CheckOutputs" -> True};
M1000 = RandomInteger[{0, 1}, {1000, 1000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000
Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering
solution still the fastest but with the sort
solution now becoming second fastest.
M10000 = RandomInteger[{0, 1}, {10000, 10000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000
$endgroup$
add a comment |
$begingroup$
The following shows an order of magnitude difference between the most efficient ordering
solution and the least efficient gatherBy
solution for a 1000 x 1000 matrix as required by the OP.
ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];
sort[M_] :=
Sort[#[[All, 2]] & /@
SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];
gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];
nearest[M_] :=
DeleteDuplicates[
Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];
positionIndex[M_] := Values@PositionIndex@M;
SetAttributes[speedTest, HoldAll];
speedTest[functions_, opts : OptionsPattern[]] :=
Function[inp, speedTest[functions, inp, opts], HoldAll]
speedTest[functions_, inp_, OptionsPattern[]] := Module[{
tm = Function[fn,
Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],
timesOutputs, times, sameQ},
timesOutputs = Through[(tm /@ functions)@inp];
times =
SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;
sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);
If[OptionValue@"CheckOutputs",
Labeled[times,
Row[{ToString@Unevaluated@inp, Spacer@80,
If[sameQ, Style["[Checkmark]", Green, 20],
Style["x", Red, 20]]}], Top], times]
];
Options[speedTest] = {"CheckOutputs" -> True};
M1000 = RandomInteger[{0, 1}, {1000, 1000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000
Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering
solution still the fastest but with the sort
solution now becoming second fastest.
M10000 = RandomInteger[{0, 1}, {10000, 10000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000
$endgroup$
add a comment |
$begingroup$
The following shows an order of magnitude difference between the most efficient ordering
solution and the least efficient gatherBy
solution for a 1000 x 1000 matrix as required by the OP.
ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];
sort[M_] :=
Sort[#[[All, 2]] & /@
SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];
gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];
nearest[M_] :=
DeleteDuplicates[
Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];
positionIndex[M_] := Values@PositionIndex@M;
SetAttributes[speedTest, HoldAll];
speedTest[functions_, opts : OptionsPattern[]] :=
Function[inp, speedTest[functions, inp, opts], HoldAll]
speedTest[functions_, inp_, OptionsPattern[]] := Module[{
tm = Function[fn,
Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],
timesOutputs, times, sameQ},
timesOutputs = Through[(tm /@ functions)@inp];
times =
SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;
sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);
If[OptionValue@"CheckOutputs",
Labeled[times,
Row[{ToString@Unevaluated@inp, Spacer@80,
If[sameQ, Style["[Checkmark]", Green, 20],
Style["x", Red, 20]]}], Top], times]
];
Options[speedTest] = {"CheckOutputs" -> True};
M1000 = RandomInteger[{0, 1}, {1000, 1000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000
Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering
solution still the fastest but with the sort
solution now becoming second fastest.
M10000 = RandomInteger[{0, 1}, {10000, 10000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000
$endgroup$
The following shows an order of magnitude difference between the most efficient ordering
solution and the least efficient gatherBy
solution for a 1000 x 1000 matrix as required by the OP.
ordering[M_] := Sort@SplitBy[Ordering@M, M[[#]] &];
sort[M_] :=
Sort[#[[All, 2]] & /@
SplitBy[SortBy[Thread[{M, Range@Length@M}], First], First]];
gatherBy[M_] := GatherBy[Range@Length@M, M[[#]] &];
nearest[M_] :=
DeleteDuplicates[
Sort /@ Nearest[M -> Automatic, M, {[Infinity], 0}]];
positionIndex[M_] := Values@PositionIndex@M;
SetAttributes[speedTest, HoldAll];
speedTest[functions_, opts : OptionsPattern[]] :=
Function[inp, speedTest[functions, inp, opts], HoldAll]
speedTest[functions_, inp_, OptionsPattern[]] := Module[{
tm = Function[fn,
Function[x, <|ToString[fn] -> RepeatedTiming@fn@x|>]],
timesOutputs, times, sameQ},
timesOutputs = Through[(tm /@ functions)@inp];
times =
SortBy[Query[All, All, First]@timesOutputs, Last] // Dataset;
sameQ = SameQ @@ (Query[All, Last, 2]@timesOutputs);
If[OptionValue@"CheckOutputs",
Labeled[times,
Row[{ToString@Unevaluated@inp, Spacer@80,
If[sameQ, Style["[Checkmark]", Green, 20],
Style["x", Red, 20]]}], Top], times]
];
Options[speedTest] = {"CheckOutputs" -> True};
M1000 = RandomInteger[{0, 1}, {1000, 1000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M1000
Note the green tick indicates the same outputs. The same order and magnitude difference is evident for a 10 0000 x 10 000 matrix with the ordering
solution still the fastest but with the sort
solution now becoming second fastest.
M10000 = RandomInteger[{0, 1}, {10000, 10000}];
speedTest[{gatherBy, positionIndex, nearest, ordering, sort}]@M10000
edited yesterday
answered yesterday
Ronald MonsonRonald Monson
3,2131734
3,2131734
add a comment |
add a comment |
$begingroup$
I'd use GroupBy
.
First the names of the rows: can be anything you like, for example
rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]
{S1, S2, S3, S4, S5, S6, S7, S8}
Next the grouping:
groups = GroupBy[Thread[rownames -> M], Last -> First]
<|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>
If all you need are the names:
Values[groups]
{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}
$endgroup$
add a comment |
$begingroup$
I'd use GroupBy
.
First the names of the rows: can be anything you like, for example
rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]
{S1, S2, S3, S4, S5, S6, S7, S8}
Next the grouping:
groups = GroupBy[Thread[rownames -> M], Last -> First]
<|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>
If all you need are the names:
Values[groups]
{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}
$endgroup$
add a comment |
$begingroup$
I'd use GroupBy
.
First the names of the rows: can be anything you like, for example
rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]
{S1, S2, S3, S4, S5, S6, S7, S8}
Next the grouping:
groups = GroupBy[Thread[rownames -> M], Last -> First]
<|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>
If all you need are the names:
Values[groups]
{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}
$endgroup$
I'd use GroupBy
.
First the names of the rows: can be anything you like, for example
rownames = Array[ToExpression["S" <> ToString[#]] &, Length[M]]
{S1, S2, S3, S4, S5, S6, S7, S8}
Next the grouping:
groups = GroupBy[Thread[rownames -> M], Last -> First]
<|{0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} -> {S1, S8},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0} -> {S2, S3, S4},
{0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0} -> {S5, S6, S7}|>
If all you need are the names:
Values[groups]
{{S1, S8}, {S2, S3, S4}, {S5, S6, S7}}
answered yesterday
RomanRoman
6,44611134
6,44611134
add a comment |
add a comment |
Thanks for contributing an answer to Mathematica Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197039%2ffind-the-identical-rows-in-a-matrix%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
$begingroup$
Try
Values[PositionIndex[M]]
$endgroup$
– Coolwater
yesterday
$begingroup$
@Coolwater If there is a unique row, your method will fail. At least one needs to delete if list has length 1
$endgroup$
– Okkes Dulgerci
yesterday
2
$begingroup$
Possible duplicate of How to efficiently find positions of duplicates?
$endgroup$
– Michael E2
yesterday
$begingroup$
@Coolwater IMHO, the best answer is lacking so far. Please consider posting
PositionIndex
as possible solution.$endgroup$
– Henrik Schumacher
yesterday